Introduction
Cacao, Theobroma cacao L., Malvaceae sensu lato [1], is an important Neotropical, perennial crop, on which the thriving global cocoa and chocolate industry is based. The World Cocoa Foundation, in 2012, reported that 40–50 million people worldwide depend on cocoa for their livelihood. The global export value of cocoa beans has been fluctuating between 8 and 10.5 billion USD over the past decade and the global chocolate market was valued at USD 106.6 billion in 2019 [2]. T. cacao is a diploid (2n = 20), allogamous species. Its genome is small, reported by Argout et al. [3] to span 411–494 Mb. Its putative centre of genetic diversity is at the headwaters of the Amazon River, South America [4], and it is indigenous to the Amazon and Orinoco rainforests.
In the current context of climate change and disease evolution, it is important to acquire a comprehensive understanding of the genetic architecture of traits of interest such as disease and pest resistance, high yield, desirable flavour, favourable flavanol (antioxidant) content, self-compatibility, and other traits of economic relevance or with potential health benefits [5–8]. Of these, yield and disease resistance traits are mainly polygenic [9, 10] and tagging such traits with molecular markers will facilitate early selection for the desired genotypes and formulation of breeding strategies to create new, adapted varieties. The considerable progress made in cacao genomics in the last decade [7, 11–34], along with the increase in the number of available markers, will help to accelerate progress in improvement of cacao planting material.
Genome-wide association studies (GWAS) have been used widely in breeding to detect significant population-wide associations between genetic markers, such as Single Nucleotide Polymorphisms (SNPs), and causal polymorphisms, with functional roles in controlling traits of interest like yield potential [35, 36]. The search for genotype-phenotype correlations is conducted in unrelated individuals [37, 38], and is based on non-random association of alleles in the population and linkage disequilibrium (LD) [39, 40]. Robust GWAS models, using population structure and kinship matrices as co-variates, facilitate the identification of mainly non-spurious associations [36].
This application of GWAS in T. cacao L. is facilitated by the mapping of the cacao genome and several previous studies. Allegre et al. [33] and Fouet et al. [32] identified and mapped SNPs and microsatellite (SSR) markers useful as expressed sequence tags (ESTs) and constructed a high-density genetic map for T. cacao. Both kinds of markers are co-dominant and thus powerful for genetic analysis. Allegre et al. [33], mapped 851 SNP markers corresponding to genes with putative functions and displaying a distinct polymorphic pattern across a selection of cacao germplasm. These SNPs are located within a gene expressed sequence and thus valuable for identifying candidate genes with functional roles in cacao. The data are available at http://tropgenedb.cirad.fr.
The sequencing of the cacao genome [3, 29] should facilitate the identification of candidate genes for traits of interest, which are co-localized with marker-trait associations (MTAs). Version 2.0 of the Criollo genome of Argout et al. [29] has 99% of the assembly anchored to the 10 chromosomes of the T. cacao L. genome. This resource was utilized in this study.
The objectives of this research were to exploit the naturally occurring genetic variation in a large collection of cacao trees of diverse origin, conserved ex situ, to perform GWAS to identify SNP markers significantly associated with phenotypic traits, examine linkage disequilibrium patterns on the chromosomes, and locate putative annotated candidate genes that encode yield-related and other traits. A genomic selection approach was then used to establish predictive values for phenotypic traits of interest in order to examine the efficiency of this breeding strategy to improve cacao yield.
Materials and methods
Plant material
The germplasm studied consisted of four hundred and twenty-one (421) cacao accessions, including 263 wild genotypes (collected in the Amazon Basin [41]), which are conserved at the International Cocoa Genebank Trinidad (ICGT). Complete phenotypic data were collated for 346 of these accessions and incomplete data were accrued for 75 accessions (S1 Table http://dx.doi.org/10.13140/RG.2.2.16179.71202 and S2 Table http://dx.doi.org/10.13140/RG.2.2.29601.48484). The accessions represent 23 “accession groups”, as described by Bekele et al. [41], as well as most of the genetic groups defined by Motamayor et al. [42], Wild cacao types such as those of the AMAZ, GU, IMC, MO, NA, PA, POUND, RB, SCA and SPEC (1–54) accession groups [41], which have evolved over a long period of time, were included to improve the power of detection of associations between SNP markers and phenotypic traits of interest as recommended by Stack et al. [40].
Plant material was collected from the 421 accessions at the ICGT. Sample collection spanned the period 1992–2012. The ICGT is situated at the University Cocoa Research Station, Centeno, North-East Trinidad at an altitude of 15 m above sea level. Shade is provided by trees of Erythrina sp. and bananas (Musa sp.) planted 6 m and 4 m apart, respectively. Each plot typically had 16 trees of a specific accession, which are planted 1.8 m apart. The soil type is Cunupia fine sandy clay with restricted internal drainage.
Over a 30-year period from 1981, the average temperature in this region was 26.3ºC. It was 26ºC for the period 1961 to 1991. This satisfied the optimal temperature requirements for growing cacao. The mean annual rainfall for 1981 to 2011 was 1945.2 mm, lower than the 2,392 mm recorded for 1961 to 1991 (Trinidad and Tobago Meteorological Office https://www.metoffice.gov.tt). The trees received irrigation as necessary during the dry season (January-June) each year. The plots were maintained with standard production practices regarding weeding and pruning. However, no pesticides or fungicides were applied to allow monitoring of the natural reactions of the trees to pest and diseases and yield parameters. Fertilizers were applied at planting and regularly for only young trees and the trees were maintained within a low input system.
Phenotypic evaluation
The cacao accessions were assessed in terms of 27 flower, fruit and seed traits (Table 1) as described by Bekele et al. and Bekele and Butler [41, 43]. The traits studied were found to be the most discriminative and taxonomically useful descriptors, which avoid redundancy. They were also selected for ease of observation, reliability of scoring, and, in the case of seed characters, agronomic and economic value [44–46]. When possible, the full complement of samples was collected for each accession at a given time, but in many cases, samples had to be collected over multiple years to obtain the full sample size. In these cases, sampling was conducted during the same season to preclude the effect of seasonality on phenotypic trait expression. The fruits characterized were the products of open pollination. These data are available online in the International Cocoa Germplasm Database (ICGD) (http://www.icgd.rdg.ac.uk/).
[Figure omitted. See PDF.]
Genotyping by sequencing
SNP markers that provide good coverage of the cacao genome [33] were employed in this study. The selection of 836 SNPs in coding sequences, which displayed significant similarity with known protein sequences, as described by Argout et al. [34], was carried out by Allegre et al. [33]. Illumina SNP genotyping was performed with the Illumina BeadArray platform at the French National Genotyping Centre (CNG, CEA-IG, Evry, France), according to the GoldenGate Assay manufacturer’s protocol. The genotype calling of each marker was verified using reference genotypes and filtered, as described by Argout et al. [34] and Allegre et al. [33]. The QualitySNP pipeline was used for detection of SNPs in the unigenes. All of the SNPs employed for genotyping have been identified in orthologous genes or gene families and this facilitates reference to genetic information, made available via the genome browser, CocoaGen DB (http://cocoa-genome-hub.southgreen.fr/jbrowse) and UniProtKB (https://www.uniprot.org/uniprot/).
Statistical analyses
Phenotypic data analysis.
Qualitative data such as fruit shape classes were first converted to binary form. The quantitative traits that were found to deviate from normality were log-transformed. Tests of normality, log transformation of data that were not normally distributed, derivation of descriptive statistics, ANOVA (General Linear Model for unbalanced data) of log-transformed data, and correlation analysis of the collated quantitative phenotypic data were performed using Minitab Version 18. The correlation matrices for the quantitative yield-related and anthocyanin pigmentation traits studied were calculated using command cor() and visualised with the ‘corrplot’ package in R [47]. Pearson and Spearman correlation coefficients were derived for the continuous and interval data, respectively.
Determining associations between molecular polymorphisms and quantitative traits.
Population structure. Population structure was determined to allow estimation of marker-trait associations without including spurious associations [48]. This was necessary to satisfy the independence assumption under the null hypothesis on which the marker-trait association was based [49]. The Bayesian clustering software, STRUCTURE, [49–53] was employed for this purpose. It defined the inferred ancestry of individuals, studied as coefficients of the individuals across sub-populations. Individuals with coefficients of membership of less than 0.7 were classified as admixed. The allele frequencies correlated model used Markov Chain Monte Carlo (MCMC) simulations to estimate the group (cluster, K) membership of each individual studied, assuming Hardy-Weinberg and linkage equilibrium within groups, random mating within populations and free recombination between loci [49].
Multi-locus genotype data for 200 SNPs, distributed over all 10 chromosomes, with minor allele frequency (MAF) greater than 0.05 and low missing values (less than 10%) were analysed in STRUCTURE to describe and visualize population structure, based on allele frequencies of the data. The optimum K value that best defined the population structure was identified using the admixture model of ancestry, assuming correlated allele frequencies for K = 2 to 15 with 150,000 iterations during the burn-in period, 150,000 Markov Chain Monte Carlo repetitions and 10 independent runs for each genetic sub-population (K2- K15).
The STRUCTURE outputs were analysed to infer optimal K based on the method described by Pritchard [49]. The optimal K was chosen by plotting the log probability of the data, Pr (X | K), against a range of K values and selecting the one after which the curve formed a plateau, while also considering the consistency of the groupings across multiple runs with the same K. Runs for which the variance was not homogeneous with variances of the other runs with the same K value were excluded.
The population structure of the 421 accessions studied was visualized using DARwin (Dissimilarity Analysis and Representation for Windows) version 6 (http://darwin.cirad.fr) [54]. DARwin was used to estimate pairwise Jaccard’s genetic dissimilarity indices using the 200 SNP markers employed in STRUCTURE Analysis. A tree was constructed by clustering accessions, based on a dissimilarity matrix using the Unweighted Pair Group Method with Arithmetic Mean (UPGMA). Clade strength in the dendrogram was tested using 1000 bootstraps. This tree was rendered with iTOL (https://itol.embl.de/).
Genome-wide Association Studies (GWAS). TASSEL [55], version 5.2.50 for Windows, was employed to conduct GWAS and the results for yield-related traits were compared with those derived with GAPIT FarmCPU (Fixed and random model for circulating probability unification) [56]. In TASSEL, the Mixed Linear model (MLM) tested fixed effects to determine associations between genetic sites and phenotypes. FarmCPU was employed to verify the results generated with TASSEL MLM since it caters for random as well as fixed effects in the MLM, and removes false positive and negative associations [56].
MLM in TASSEL was used to correct covariances due to relatedness at the population level between genotypes (due to population structure) as well as co-ancestry (kinship) or identity by descent. The inclusion of the K (kinship) matrix allowed the inclusion of multiple backgrounds QTL as a random factor in the mixed linear model, as explained by Henderson [57]. The scaled, centred ‘IBS’ method, described by Endelman and Jannink [58], was used to generate the Kinship matrix of relationships among genotypes using TASSEL.
The statistical model used for the TASSEL MLM is as follows:
Where Y is the vector of observations, β an unknown vector of fixed effects including genetic marker and population structure (Q); u is an unknown vector of random additive genetic effects from multiple background QTL for the individuals; X and Z are the known design matrices; and e is the unobserved vector of random residuals.
Each marker allele was fit as a separate class with heterozygotes fit as an additional class. The minor allele frequency (MAF) was set to > 0.05. SNPs with more than 14% missing values were removed and imputation was performed to replace any other missing values using the Euclidean distance measure in the k-nearest-neighbour algorithm [59] generated for the 10 nearest neighbours [55]. Phenotypic data for 75 accessions were imputed based on their genetic profiles.
The stringent Bonferroni correction test to cater for multiple testing [55] was applied to the derived P values to test the significance of the associations between traits and markers. Manhattan plots, generated in R [47], were also used to check for evidence of P value inflation and to identify significant MTAs based on TASSEL MLM.
Simple M in R [47] was used to determine the number of independent markers that should be used in testing to avoid the penalty of the stringent Bonferroni correction test, as prescribed by Gao et al. [60]. There were 664 SNP markers that were independent.
The LD pattern was tested with 100,000 permutations in TASSEL to obtain P values for the tests. The correlation between alleles at two loci, r2, and the standardized disequilibrium coefficient for determining whether recombination or homoplasy had occurred between a pair of alleles, D2, were derived. Fisher’s exact test was calculated to compare alleles at any two loci. LD heatmap was used to scan for high linkage disequilibrium within chromosomes based on r2 values. High LD was characterized by many red squares in the heatmap generated. Many red blocks together would be inferred as corresponding to haplotype blocks [61, 62].
Loess regression was performed in R [47] to investigate LD decay by plotting the r2 values as a function of genetic distance in megabases (Mb). The decay was plotted to show the points representing the distance on each chromosome, at which the mean value of r2 decreased to half of the maximum value.
Quantile-quantile plots were also generated in R [47] to search for evidence of bias in the GWAS, such as due to genotyping artifacts, and to discern the extent to which the observed distribution of the test statistic followed the expected (null) distribution. The proportion of phenotypic variance explained by a marker was determined by the square of the partial correlation coefficient (R2).
Genomic prediction.
The predictive breeding value (GEBV) (along with the predictive error variance (PEV)) for each phenotypic trait was derived using ridge regression in TASSEL 5.2.50 software, which performed multiple correlation tests to compute correspondence between genotype and phenotype and accounted for collinearity to avoid bias. The genotypic data (737 SNPs in alpha format) were loaded, and kinship analysis was performed to generate an Identity by Descent (IBD) file. The phenotypes and K matrix were selected, and the genomic selection routine was run with 5-fold cross validation with 100 iterations.
Detecting candidate genes located within marker-trait association zones.
The identification of putative candidate genes was done using the latest cocoa genome sequence (T. cacao Criollo genome version 2) [29] available on the Cocoa Genome hub (https://cocoa-genome-hub.southgreen.fr/jbrowse). The biological functions of genes/transcripts close to the significant SNPs were determined by searching the SNP sequences with proteins related to seed development on the cocoa genome after referring to UniProtKB 2021 to identify such proteins. Flanking regions upstream and downstream of stable MTAs were selected to locate putative candidate genes. The lengths of the flanking regions were determined based on the linkage disequilibrium decay intervals on the respective chromosomes. For chromosomes 3, 5, 6 and 8, this was over a distance of 2.5 Mb upstream or downstream of the significant MTAs. It was over a distance of 5 Mb for chromosome 1, 1.6 Mb for chromosome 4, 0.87 Mb for chromosome 7 and 0.86 Mb for chromosome 9. A physical map, showing the position of annotated putative candidate genes that co-localised with SNP markers associated with traits of interest (MTAs), was then constructed using SpiderMap (Rami, 2007 unpublished, Spidermap v1.7.1, free software, CIRAD).
Results
Phenotypic data analysis
The phenotypic data for the fully characterized accessions are presented in S2 Table (http://dx.doi.org/10.13140/RG.2.2.29601.48484). The tests of normality results are presented in S3 Table (http://dx.doi.org/10.13140/RG.2.2.22890.59849). Tests of normality indicated significant deviation from normality for fruit length, width, fruit length to width ratio, total fresh seed mass, seed number, individual seed (cotyledon) mass and width, seed length to width ratio, pod index, ovule number, sepal length and width, and style length. Results of tests of normality performed on the natural log transformed values of fruit and seed quantitative traits indicated that natural log transformation corrected for the deviation from normality of these traits (S3 Table http://dx.doi.org/10.13140/RG.2.2.22890.59849). The phenotypic data were thus natural log transformed to conduct GWAS. Untransformed data were also subjected to GWAS for comparison.
The large phenotypic variation expressed in this panel of cacao genotypes, based on the coefficients of variation (S4 Table http://dx.doi.org/10.13140/RG.2.2.36312.37128), and evident in Fig 1 and S1 Fig (http://dx.doi.org/10.13140/RG.2.2.24148.88966), indicated a diverse genetic background, which was suitable for GWAS. The results of ANOVA for yield-related traits are presented in S5 Table (http://dx.doi.org/10.13140/RG.2.2.17438.00328). All of the traits differed significantly between wild and cultivated germplasm except for seed (cotyledon) length to width ratio and fruit length.
[Figure omitted. See PDF.]
The asterisks in the box and whisker plots represent outliers.
Correlation analysis revealed the interdependence of traits on one another. Of the 91 Pearson correlation coefficients, r, calculated for the quantitative traits, only 20 were not significant (P ≥ 0.05). It is noteworthy that the yield-related traits, seed number, seed/cotyledon mass and seed dimensions, were highly correlated (P ≤ 0.001). Correlations of the yield-related traits for 346 cacao accessions are presented in the Correlogram (Fig 2) Seed number was negatively correlated (r = -0.205***) with individual dried cotyledon mass. Seed/cotyledon length and width were positively correlated with individual dried seed/cotyledon mass (r = 0.688*** and 0.751***, respectively). This suggests that the former traits can be used as indicators of the latter, a significant finding.
[Figure omitted. See PDF.]
Positive correlations are displayed as blue circles and negative correlations as orange circles. The sizes of the circles are proportional to the correlation coefficients. The plant organs for which anthocyanin intensity was measured were the flower ligule, filament and pedicel, and mature fruit ridges and seed cotyledons).
There was also a strong correlation (0.568***) between ovule number and seed number that justifies prediction of the latter using the former when fruits are unavailable. The correlations of fruit width with pod index and seed length, r = -0.621*** and 0.680***, respectively, were also of interest.
Correlations for anthocyanin intensity in the various plant organs are presented in Fig 2. All of these correlations, based on the Spearman correlation coefficient (rs), were significant (P ≤ 0.05) except for the anthocyanin pigment concentration in the pedicel and anthocyanin pigmentation of the cotyledon (r = -0.097). It was notable that the correlations involving mature fruit surface (ridges) anthocyanin intensity and that in the ligule and filament of the flower and in the seed/cotyledon were all significantly (P ≤ 0.05) negative (r = -0.150, -0.257, -0.147, respectively).
Population structure
Ten replicate runs of models using genotypic clusters (K) from 2 to 15 confirmed that K = 7 had the highest log-likelihood probability (log Pr (X | K) versus K) (Fig 3).
[Figure omitted. See PDF.]
Analysis of population structure of 421 cacao accessions using STRUCTURE—estimated LnP(K) of possible clusters (K) from 2 to 15. When K is approaching a true value, L(K) plateaus (or continues increasing slightly).
The population structure analysis revealed that 74% of the accessions could be stratified into seven sub-populations, while 26% could be regarded as admixtures. The constitution of the seven genetic clusters is presented in S6 Table (http://dx.doi.org/10.13140/RG.2.2.27504.33282). The IMC and AMAZ groups were clustered together as were also the SCA, MO and LCTEEN groups (S6 Table http://dx.doi.org/10.13140/RG.2.2.27504.33282). Several Upper Amazon accessions were genotyped as Trinitario, and are putatively mislabelled. The genetic diversity of the accessions studied is represented in the neighbour-joining tree in Fig 4, which depicts the seven clusters differentiated. (One SPEC accession was not clustered with other accessions).
[Figure omitted. See PDF.]
The tree was generated in DARwin Version 6 and rendered in iTOL version 6 (https://itol.embl.de/). Seven admixed groups are evident.
Linkage disequilibrium (LD)
D2 greater than 0.6, where D2 of 1 represents the highest amount of disequilibrium possible, was indicative of recombination or homoplasy between pairs of alleles (Table 2). With regard to the mean squared correlations of allele frequencies, r2 values, none were observed to be greater than 0.2 and the mean r2 across the 10 chromosomes was 0.1 (Table 2). Chromosomes 1 and 3 had mean r2 > 0.14. An average decay of r2 to 50% (to r2 = 0.1) over chromosomes 1, 4, 5, 7 and 9 occurred at a distance of 5.21 Mb (9 cM) (S7 Table http://dx.doi.org/10.13140/RG.2.2.20793.44647). LD decay plots, based on r2, are presented in Fig 5.
[Figure omitted. See PDF.]
Plot of pairwise linkage disequilibrium coefficients (r2) on chromosome 1; Plot of pairwise linkage disequilibrium coefficients (r2) on chromosome 4; Plot of pairwise linkage disequilibrium coefficients (r2) on chromosome 5; Plot of pairwise linkage disequilibrium coefficients (r2) on chromosome 7; Plot of pairwise linkage disequilibrium coefficients (r2) on chromosome 9. Heatmap of linkage disequilibrium (r2) across the chromosomes 4 and 5 based on data for 421 cacao accessions genotyped using 612 filtered SNPs. Markers were ordered on the x and y axes in the Heatmap according to location along the chromosomes and each cell of the heatmap represents a single marker pair. The upper triangle, above the black diagonal on the heatmap, is colour-coded based on the r2 value between SNPs while colours depicted in the lower triangle are based on P-values for the corresponding r2 values.
[Figure omitted. See PDF.]
The linkage disequilibrium (LD) decay patterns, observed for the chromosomes in this study population and a LD heat map (Fig 5, S7 Table http://dx.doi.org/10.13140/RG.2.2.20793.44647), were used to inform the process of searching for putative annotated candidate genes. JBrowse in version 2 of the Criollo B97-61/B2 genome (https://cocoa-genome-hub.southgreen.fr/) was used to locate candidate genes upstream or downstream of MTA zones. The search for putative candidate genes with relevant functional (biological) roles such as in seed development, identified in UniProtKB, was conducted within defined intervals, based on linkage disequilibrium decay, spanning MTAs on each chromosome (S7 Table http://dx.doi.org/10.13140/RG.2.2.20793.44647).
Putatively robust marker-trait associations (MTAs)
The TASSEL MLM GWAS unravelled 17 significant associations. These were significant above the stringent Bonferroni threshold of P ≤ 8.17 × 10−5. There was a predominance of positive associations between highly heritable qualitative traits such as fruit shape, anthocyanin intensity on fruit surface, fruit apex form, and flower filament anthocyanin intensity and SNP markers (Table 3, S8 Table http://dx.doi.org/10.13140/RG.2.2.34215.21927). However, there were also some significant associations between SNPs and quantitative traits. Two of the seventeen highly significant (P ≤ 8.17 × 10−5) MTAs, which were identified with TASSEL MLM, were corroborated at this level of significance with GAPIT FarmCPU (Table 3). Interestingly, the most significant (P ≤ 1.15 × 10−14) MTA based on the TASSEL MLM, which was observed for seed number and TcSNP 1335 on chromosome 7, and another involving TcSNP 1335 and log seed length on chromosome 7 were not detected when the FarmCPU model was used. However, FarmCPU revealed a significant MTA involving TcSNP 1126 and seed length on chromosome 7. The highly significant MTA involving log seed number and TcSNP 785 on chromosome 1 (P≤ 2.38 × 10−05) (Table 3) was also not verified with FarmCPU, but a MTA for TcSNP 67 and seed length was identified on chromosome 1 (P ≤ 0.005). A comparison of the outputs of GWAS based on TASSEL MLM and GAPIT FarmCPU indicated 14 common MTAs involving yield-related traits at P ≤ 0.003 (Table 3). Manhattan plots and associated Quantile-Quantile plots are presented for yield-related traits in Fig 6, based on TASSEL MLM.
[Figure omitted. See PDF.]
Genome-wide association plots across 8 cacao chromosomes for seven phenotypic traits that had statistically significant MTAs: filament anthocyanin intensity, fruit surface (ridges) anthocyanin intensity, log fruit length, log seed length, log seed number, seed length to width ratio, seed number.
* Based on TASSEL version 5.2.50 MLM results for 421 cacao accessions (612 SNPs).
* Chromosome “11” was designated for unmapped SNP markers (some of which have recently been mapped).
* X- and Y-axes represent the SNP markers along each chromosome and the -log10(P-value), respectively.
* The red horizontal line corresponds to the Bonferonni significance threshold of P-values ≤ 8.17 × 10−5 (–log10 (P) = 4.088) and the blue line corresponds to a significance level of 0.005.
Fig 6 Quantile–quantile plots of estimated−log10 (P) from genome-wide association studies using TASSEL MLM. Quantile–quantile plots of estimated−log10 (P) for filament anthocyanin intensity; Quantile–quantile plots of estimated−log10 (P) for fruit surface (ridges) anthocyanin intensity; Quantile–quantile plots of estimated−log10 (P) for log fruit length; Quantile–quantile plots of estimated−log10 (P) for log seed length; Quantile–quantile plots of estimated−log10 (P) for log seed number; Quantile–quantile plots of estimated−log10 (P) for seed length to width ratio; Quantile–quantile plots of estimated−log10 (P) for seed number. The plots provide no evidence of bias in the GWAS, such as due to genotyping artifacts, and display the extent to which the observed distribution of the test statistic followed the expected (null) distribution. The red line represents expected P-values with no associations.
[Figure omitted. See PDF.]
The relatively low number of SNPs significantly associated with phenotypic traits, 2.4% of all SNPs used, was probably partly due to the stringent statistical thresholds applied in this study. Consequently, the results were carefully scrutinized just below the level of significance to discern MTAs, which may also have valid functional (biological) significance and to preclude false negative MTAs associated with TASSEL MLM. The results of the GAPIT FarmCPU analysis were confirmatory for 15 significant MTAs, including 14 yield-related traits, found with TASSEL MLM. These were significant above the Bonferroni threshold in two cases and at P ≤ 0.003 in the others (Table 3, S8 Table http://dx.doi.org/10.13140/RG.2.2.34215.21927).
Putative annotated candidate genes
Nine putative annotated candidate genes with functional roles related to seed development, lipid biosynthesis and transfer and carbohydrate transport were identified on chromosome 1 based on TASSEL MLM (Table 4). TcSNP 785, at position 31,785,158 bp on chromosome 1, was co-localized with gene Tc01v2_g025880, which is functionally significant since it encodes protein disulfide isomerase that may be required for proper pollen development, ovule fertilization and embryo development (Table 4). This finding was not corroborated with the FarmCPU analysis, which only detected TcSNP 67 to be significantly associated with seed length on chromosome 1 (at position 35,469,159 bp). The functional protein co-localised with TcSNP 67 was 40S ribosomal protein S13.
[Figure omitted. See PDF.]
On chromosome 3, 11 putative candidate genes were detected with TASSEL MLM. They encode traits associated with seed development and lipid accumulation (Table 4).
Associations based on TASSEL MLM between log seed length and TcSNP 953 (P ≤ 9.61 × 10−04) as well as with log seed length and TcSNP344 (P ≤ 2.88 × 10−05), pod index and TcSNP 555 (P ≤ 8.24 × 10−04), and seed number and TcSNP 1160 (P ≤ 1.35 × 10−04) as well as those confirmed with FarmCPU (Table 3), suggest that chromosome 4 may contain a cluster or ‘hotspot’ of QTLs for yield-related traits. Several MTAs involving TcSNPs 344 and 555 on chromosome 4, TcSNP 733 on chromosome 5 and TcSNP 642 on chromosome 8 were also confirmed with FarmCPU.
There were two putative candidate genes involved with seed development co-localised with TcSNP 344 on chromosome 4, six that encode for seed development that were co-localised with TcSNP 667 (TASSEL MLM), eight responsible for seed protein and development and sugar transport that were linked to TcSNP 555, including Sugar transporter ERD6-like 16, and five with seed development functions co-localised with TcSNP 1160 (TASSEL MLM) (Table 4). In addition, seven genes involved with lipid formation and seed development, such as Acyltransferase-like protein At3g26840%2C chloroplastic, Glycerol-3-phosphate dehydrogenase, and Stearoyl-[acyl-carrier-protein] 9-desaturase, were localised close to TcSNP 953 on chromosome 4. TcSNP 953 (position 2,822,152 on chromosome 4) is located 3.7 Kb upstream of the gene that encodes acyl transferase-like protein At3g26840. Putative E3 ubiquitin protein ligase DRIP2, which acts as a negative regulator of the response to water stress was also co-localised with SNP 555 on chromosome 4 (Table 4).
On chromosome 5, seven putative candidate genes were co-localised with TcSNP 733. These were all involved in seed development. Of particular interest was Glycerol-3-phosphate 2-O-acyltransferase 6, which is rate-limiting enzyme in the de novo pathway of glycerolipid synthesis (Table 4). One candidate gene encodes soluble inorganic pyrophosphatase 4, which is important for development, but also in stress resistance (including to cadmium ion response) in plants (https://www.uniprot.org/uniprot/Q9LFF9). The latter was not of functional significance in this study, but is important for optimised cocoa production systems. In addition, one functional candidate gene involved with seed development was co-localised with TcSNP 823 on chromosome 5, based on TASSEL MLM (Table 4).
Three putative candidate genes were found on chromosome 7, one of which was a sugar transporter. Likewise, three functional candidate genes with seed development roles were detected on chromosome 8 although the MTA, involving pod index and SNP 642, was below the stringent Bonferroni threshold (P ≤ 4.39 × 10−04) based on TASSEL MLM. The latter MTA was verified and significant at P ≤ 1.02 × 10−3 based on the FarmCPU analysis (Table 3).
TASSEL MLM revealed a MTA of functional significance on chromosome 9, albeit below the Bonferroni threshold. TcSNP 184, located at 10,934,938 bp on chromosome 9, was co-localized with four putative candidate genes, one of which encodes Zinc finger protein CONSTANS-LIKE 5. The latter is involved in the regulation of flower development and regulation of transcription. The most significant MTA (P ≤ 9.57 × 10−05) on chromosome 9, based on the FarmCPU analysis, involved TcSNP932, at position 1,479,475 bp, and log cotyledon mass (Table 3). It was 14.3 Kb downstream of 1-acyl-sn-glycerol-3-phosphate acyltransferase 1%2C chloroplastic, which has biological significance in this study.
It must also be noted that TcSNP 401 (20,485,872), located 199 Kb upstream of Tc00_t058610, which encodes a putative MYB-related protein 308, 2 Mb upstream of the gene, Tc04v2_g008890, which encodes a putative MYB family transcription factor, and 1.3 Mb upstream of the gene, Tc04v2_g009300, which encodes a MYB domain protein 20, was significantly associated with fruit surface (ridge) anthocyanin intensity on chromosome 4, based on the TASSEL and FarmCPU analyses. A significant association (P ≤ 10 × 4.57−05) (TASSEL MLM) was also found between fruit surface anthocyanin intensity and SNP 644 on chromosome 9 (S8 Table http://dx.doi.org/10.13140/RG.2.2.34215.21927).
With regard to floral traits, there was a highly significant (P ≤ 3.81 × 10−05) MTA, which was detected between sepal length and TcSNP 1334 on chromosome 3 when TASSEL MLM was performed (S8 Table http://dx.doi.org/10.13140/RG.2.2.34215.21927). This was not tested with GAPIT FarmCPU.
Of the 17 significant MTAs identified with TASSEL MLM, there were two for fruit shape oblate (an uncommon phenotype in this diverse cacao germplasm sample), and orbicular on chromosome 3. These MTAs involved TcSNPs 1353 and 1477. Interestingly, loci controlling fruit shape were dispersed over several chromosomes, representing independent linkage groups.
In summary, about 40 putative candidate genes of functional importance were identified in this study. These included those that encode protein precursors, carbohydrate transport, lipid synthesis/bioassembly, binding and metabolism, lipid transfer and seed storage, endosperm development and regulation of seed growth, embryo development leading to seed dormancy, seed development/morphogenesis and regulation of flower and pollen development and ovule fertilization as well as responses to water deprivation, cadmium contamination and other abiotic stresses and biotic (disease) stresses (Table 4).
Genomic prediction value of traits
The genomic estimated breeding values (GEBV) of the phenotypic traits studied are presented in S9 Table (http://dx.doi.org/10.13140/RG.2.2.12404.83842). Of the qualitative traits studied, fruit basal constriction and filament anthocyanin intensity had predictive values greater than 0.5. The quantitative traits, seed number, seed mass, seed length, seed width, seed length to width ratio, pod index, fruit wall hardness, ovule number and fruit width had GEBV values greater than 0.5. The yield-related traits, seed number and dried seed (cotyledon) mass had GEBV values of 0.611 and 0.6014, respectively. The GEBV value of cotyledon length and width, indicators of seed size, were 0.6199 and 0.5435, respectively. Interestingly, ovule number had the largest GEBV value of 0.6325. It is regarded as a reliable predictor of seed number, which is dependent on successful pollination.
Discussion
Linkage disequilibrium patterns
Our findings, based on the linkage disequilibrium analyses, suggest that MTAs could be localized in the cacao genome with relatively high precision particularly when wild cacao genotypes are studied. This concurs with the observations of Stack et al. [40], who observed that the wild genotypes (such as those from the Purús, Contamana, Curaray, Iquitos, Nanay, Marañón and Guiana genetic groups) exhibited very low overall LD. In our study, an average decay of r2 to 50% over chromosomes 1, 4, 5, 7 and 9 occurred at a relatively short distance of 5.21 Mb (9 cM) on average (S7 Table http://dx.doi.org/10.13140/RG.2.2.20793.44647). It is instructive that for cultivated germplasm such as Trinitario (e.g., the Imperial College Selection (ICS) accession group (S1 Table http://dx.doi.org/10.13140/RG.2.2.16179.71202), LD decay was observed in previous studies to be very gradual with increasing marker distance [40, 63].
Marker-trait associations (MTAs)
The highly significant MTAs involving quantitative yield-related traits, observed in this study, suggest stability of genomic regions involved. However, the genetic markers, co-localized with genes, were not major (Table 3). Previous studies [10, 33, 63–72] identified useful MTAs and reference has been made to 300 QTLs [33]. Knowledge gleaned on MTAs related to yield potential (as measured by Pod Index), dried individual seed mass, seed size and seed number per fruit in diverse cacao germplasm should prove valuable for future genomics-assisted breeding for yield improvement.
The relatively small effect size of the markers associated with yield-related traits, in this study, where none of the markers explained more than 20% of the phenotypic variation expressed, is not unusual for quantitative traits. Most of the markers studied explained 5 to 11% of the phenotypic variation expressed.
Genetic variation of quantitative (polygenic, continuous) traits such as yield and disease resistance are controlled by the combined effects of QTL, epistasis (interactions between QTLs) [14], the environment and interaction between environment and QTL [73]. The use of only biallelic subsets of SNPs, in this study, could have excluded multiallelic loci, which may have contributed to additional variance expressed in the study population for polygenic traits, such as those related to yield potential. Mir et al. [74] described yield as a very complex quantitative trait that is controlled by a network of minor genes. For such polygenic traits, with small effect size, increasing the sample size of the study population and densely sampling a population that shows phenotypic diversity should improve the power to detect meaningful associations [75].
A phenomenon whereby significant associations were found in this study between certain traits, such as fruit anthocyanin intensity, fruit shape and seed length to width ratio and seed number, at different loci (Table 3) was of interest. It suggests that a large part of trait variance was explained by several marker-trait associations, as has been observed in other studies [73]. In this study, there were 26 SNPs significantly associated with fruit shape, two on chromosome 1, seven on chromosome 2, nine on chromosome 3, three on chromosome 5, two on chromosome 7 and three on chromosome 8, based on the results of TASSEL MLM. It seems justifiable to hypothesize that minor genes as well as major genes affect fruit shape. The oblate shape is a trait associated with certain wild types, which have evolved over a long period of time. A well-known cacao accession with this trait is CATONGO.
The presence of markers significantly associated with different traits, in the same genomic region, was also observed in this study, based on the TASSEL MLM. The latter traits were seed number and orbicular shape on chromosome 7 (SNP 390) and seed number and seed length, also on chromosome 7 (SNP 1335) (Tables 3 and 4). These associations may represent the occurrence of linked genes, each one coded separately for a specific trait [11] or may indicate co-localization of the respective markers with a gene or gene block with putative pleiotropic effect [76]. In the case of seed number and seed length, this apparent linkage is striking since pod index (a measure of yield potential) is derived from seed number and dried individual cotyledon mass. The latter was found to be significantly correlated with seed dimensions (length and width) in this study (Fig 2). The relevant associated putative genes may have adaptive influence due to linkage mediated by selective forces [77]. The likelihood of such a phenomenon being observed during this study was feasible due to the inclusion of at least 48 cultivated accessions, including 28 Imperial College Selections (ICS) (S1 Table http://dx.doi.org/10.13140/RG.2.2.16179.71202). The latter reportedly evolved over a period of more than two hundred years in Trinidad and Tobago and were selected based on large seed size and seed number and favourable yield, among other selection criteria. It must be underscored that pleiotropic markers may facilitate simultaneous selection of the multiple traits with which they are significantly associated and thus gene pyramiding.
Putative annotated candidate genes for yield-related traits
The search for candidate genes in this study was guided by the fact that the storage compounds of cacao seeds are starch, lipids (fats) and storage proteins [78]. Bucheli et al. [79] investigated the variation of sugars, carboxylic acids, purine alkaloids, fatty acids, and endoproteinase activity during maturation of cacao seeds. Aspartic endoproteinase activity was observed to increase rapidly during seed expansion and a major change in the fatty acid composition occurred in the young embryo. Mustiga et al. [80] detected a major QTL explaining 24% of the relative level of palmitic acid on the distal end of chromosome 4, located close to the Thecc1EG017405 gene. The latter is an orthologue and isoform of the stearoyl-acyl carrier protein (ACP) desaturase (SAD) gene that is involved in fatty acid biosynthesis. Cacao seeds also contain a vicilin-like globulin, a seed storage protein [81].
There are three acyltransferases and a phosphohydrolase involved in the bioassembly of plant storage lipids, viz., glycerol-3-phosphate acyltransferase (GPAT), lyso-phosphatidic acid acyltransferase (LPAT), diacylglycerol acyltransferase (DGAT) and phosphatidate phosphohydrolase (PAPase) [82]. Fritz et al. [83] purified glycerol-3-phosphate acyltransferase from the post-microsomal supernatant of cocoa seeds.
Triacylglycerols (TAGs) are the major storage lipids in several plants and serve as energy reserves in seeds that are later used for germination and seedling development [84, 85]. The terminal step in TAG formation in plants involves the catalytic action of diacylglycerol acyltransferase (DGAT) in the presence of acyl-CoA [84]. Developing seeds in Brassica napus have been reported to produce Diaylglycerol (DAG) during the active phase of oil accumulation [86].
The proteins encoded by candidate genes, which were co-localized with SNPs found to be significantly associated with yield-related traits, during this study (Table 3), are presented in Table 4. An association (just below the stringent significance level with Bonferroni correction) observed between seed length (TASSEL MLM and FarmCPU) and seed cotyledon mass (FarmCPU) and TcSNP 953 on chromosome 4, at a position of 2,822,152 bp, 3.7 Kb upstream of a gene that encodes diacylglycerol acyltransferase (Acyltransferase-like protein At3g26840%2C chloroplastic) (Table 4), is among the most noteworthy of this study and warrants further investigation with expression studies.
At3g26840 is involved in seed storage (globulins) and seed size in Assembly cotton 46 (Gossypium spp.) [82]. Gossypium spp. are related to cacao, both being members of the Malvaceae family. The discovery of genes that encode Glycerol-3-phosphate dehydrogenase, and Stearoyl-[acyl-carrier-protein] 9-desaturase on chromosome 4 was also of relevance. The association revealed by FarmCPU between TcSNP 932, on chromosome 9, with log cotyledon mass, must be considered due to its functional importance. This SNP was co-localised with 1-acyl-sn-glycerol-3-phosphate acyltransferase 1%2C chloroplastic, which is involved in lipid metabolism.
Another putative candidate gene, unravelled during this study, was Tc01v2_g022850, which encodes Bifunctional inhibitor/lipid-transfer protein/seed storage 2S albumin superfamily protein involved in Glycerolipid metabolism (chromosome 1, 2.5 Mb downstream of SNP 785 (31,785,158) on chromosome 1) (Table 4). It is must also be recorded that TcSNP 555, on chromosome 4, was co-localised with vicilin in this study.
Consensus MTAs involving yield-related traits in cacao
Since several hundred MTAs or QTLs have previously been identified in cacao, an attempt was made to check for congruence between the findings of this study and previous ones with respect to common traits. Highly significant (stable) associations between yield-related traits, seed length and seed length to width ratio, seed number and seed (cotyledon) mass and pod index, and SNPs were found on chromosomes 1, 3, 4, 5, 7 and 9 in this study (Table 3). The findings from previous studies that coincide with those of this study are highlighted below.
Eight significant associations between SSR markers and yield-related traits were reported by Marcano et al. [87]. These included associations with fresh seed mass on chromosomes 1, 2, 5, 6, 9 and 10, MTAs involving dried seed mass (100 seeds) on chromosomes 2, 4, 9 and 10 as well as one marker associated with seed number per fruit on chromosome 5. In their mapping study, dos Santos Fernandes et al. [88] identified several QTLs flanked by the markers on chromosome 4, which were associated with pod index, dried individual seed mass, number of fruits harvested and number of healthy fruits harvested. They also identified a significant association between the marker, Tcm002s23708704, and pod index on chromosome 2. dos Santos Fernandes et al. [88] unravelled 13 candidate genes linked to yield (dried seed mass, pod number), on chromosomes 4 and 2. Nine of these genes are annotated as transmembrane transporters, specializing in sugar transport, two genes are involved in carbohydrate metabolism, one gene codes for lipid metabolism, and one is involved in glucose metabolism [88]. Motilal et al. [64] identified three SNPs (TcSNP 368, 697, 1370) on chromosomes 1 and 9 that were significantly associated with seed number. Clément et al. [20] found two QTLs for yield in the clone, POUND 12, located close to a QTL for yield, identified in IMC 78, on chromosome 4.
Previous studies thus commonly observed loci on chromosomes 1, 2, 4, and 9 to be associated with seed mass and dimensions in cacao [87]. The findings of this study suggest that yield-related traits are associated with loci on chromosomes 1, 3, 4, 5, 7, 8 and 9, putatively linked to functional genes (Tables 3 and 4). Two SNPs, TcSNP 344 and 953, were associated with seed length on chromosome 4 (Tables 3 and 4). TcSNP 953 is located at the top of chromosome 4 (Fig 7) and thus the MTA, though below the stringent Bonferroni level of significance, may be considered validated since it was observed in a common region where yield-related MTAs were located in the studies by dos Santos Fernandes et al. [88] and Clément et al. [20].
[Figure omitted. See PDF.]
Gene loci and proteins are shown on the right and genetic distances (Mb) are shown on the left. No candidate genes were identified on chromosomes 2 and 10.
A highly significant MTA, observed in this study, involved seed number and TcSNP 785 on chromosome 1 (TASSEL MLM). Motilal et al. [64] also reported such an MTA on chromosome 1.
Criollo, Forastero and Trinitario are widely recognised classes of cocoa in the trade. The cacao accessions in this study that displayed favourable yield potential were mainly Trinitario (cultivated germplasm) [44] and those with Criollo ancestry. This was due mainly to their large seed sizes (Table 5). This supports the deduction of Doebley et al. [89] that “cultivated species generally have larger fruits or seeds compared to their wild ancestors, indicating that fruit and seed size are major agronomic traits that have been selected in crops during their domestication.” However, several Upper Amazon Forastero types, in this study, also had favourable (low) pod index due to their large seed numbers (Table 5, S1 Fig http://dx.doi.org/10.13140/RG.2.2.24148.88966).
[Figure omitted. See PDF.]
Improving yield potential with MTAs unravelled
Based on the findings of this study, elucidation and selection of genotypes associated with large seed size in T. cacao may thus be facilitated by using TcSNP 953, TcSNP 555 and TcSNP 344 on chromosome 4 and TcSNP 733 on chromosome 5 (Tables 3 and 4). The results of a preliminary evaluation to detect genotypes associated with favourable yield potential (pod index) are presented in Table 5 and involve TcSNP 555, TcSNP 642 (on chromosome 8) and TcSNP 953. Further investigation with haplotype inferencing for genomic prediction and involving training and test populations of T. cacao L. in genomic selection studies, as described by Bhat et al. [90], is recommended.
Additional studies to investigate functional genomics associated with yield-related traits in cacao are also recommended. Such studies, in wheat, have revealed transcription factors, which can affect seed number, genes involved in metabolism or signalling of growth regulators, genes determining cell division and proliferation related to seed size, and floral regulators that regulate inflorescence architecture and seed number. Genes involved in carbohydrate metabolism, affecting plant architecture and grain yield such as trehalose phosphate synthase (TPS) and trehalose phosphate phosphatase (TPP) genes have also been identified [91].
These recommended follow-up studies would entail gene expression analysis, involving transcriptomics as described by Jako et al. [82]. The DGAT gene, Tag1, from Arabidopsis was shown to encode an acyl-CoA-dependent DGAT. Jako et al. [82] also demonstrated that seed-specific over-expression of the DGAT cDNA in wild-type Arabidopsis “enhances oil deposition and average seed mass, which are correlated with DGAT transcript levels”, and that “DGAT has an important role in regulating the quantity of seed TAGs, the sink size in developing seeds and thus seed size.” They also demonstrated that “over-expression” of the acyl-CoA-dependent DGAT in a “seed-specific manner in wild-type Arabidopsis plants results in increased oil deposition and average seed mass.”
Seed size has also been shown to be directly determined by carbohydrate import into seeds, in maize and rice, and involves SWEET-mediated hexose transport [92]. SWEET genes regulate the transport, distribution and storage of carbohydrates in plants, and are involved in many important physiological processes, including phloem loading, reproductive development, disease-resistance, stress response, and host-pathogen interaction. In this study, SWEET17 was localized on chromosome 4 upstream of TcSNP555 (Table 4), which was associated with pod index (P ≤ 4.29 × 10−4). SWEET 2 was also localized downstream of TcSNP1335, which was significantly associated (P≤ 1.15 ×10−14) with seed number as well as with log seed length (P≤ 6.75 × 10−05) on chromosome 7 (TASSEL MLM) (Table 4).
Anthocyanin pigmentation relationships
In this study, fruit ridge anthocyanin concentration (fruit colour) was significantly associated with TcSNP 401 on chromosome 4, located at 20,485,872, about 199 Kb upstream of the gene Tc00_t058610 (mRNA). The latter encodes a Putative MYB-related protein 308. TcSNP 401 was also 198 Kb upstream of the Transcription repressor MYB 6 (Table 4) (https://cocoa-genome-hub.southgreen.fr/). This SNP accounted for 7.1% of the phenotypic variation in fruit colour observed (P ≤ 4.78 × 10−06) with TASSEL MLM (Table 3). The MTA was corroborated with FarmCPU. Four SNP variants on chromosome 4, co-localised with a MYB transcription factor gene (TcMYB113), were inferred to encode ‘fruit colour differences’ between cacao varieties by Motamayor et al. [70].
Another significant association (P ≤ 6.79 × 10−05) for fruit surface anthocyanin intensity was found in this study with TASSEL MLM analysis. It involved TcSNP 1203, on chromosome 3 (located at 563,101 bp), which accounted for 5.6% of the phenotypic variation in fruit surface anthocyanin concentration in this cohort of germplasm.
Three regions associated with pigmentation on different organs in cacao have been identified by Marcano et al. [87]. This sector includes the major locus identified by Crouzillat et al. [12] as responsible for controlling ‘seed colour’ in Catongo x POUND 12 backcross progeny.
MYB proteins are involved in regulatory networks controlling metabolism, including the synthesis of anthocyanins, responsible for the red pigmentation in cacao [93, 94]. Liu et al. [95] observed that overexpression of the Tc-MYBPA gene elicited increased expression of several genes encoding the major structural enzymes of the proanthocyanidin and anthocyanidin pathway in cacao.
There were also significant associations between filament anthocyanin concentration and SNPs on chromosomes 3 and 8 (TcSNPs 1183 and 1441, respectively), based on the TASSEL MLM results of this study (Table 4, S8 Table http://dx.doi.org/10.13140/RG.2.2.34215.21927). All of the MTAs involving anthocyanin concentration in this study suggest multi-gene control of anthocyanin intensity in the mature fruit epidermis and flower filaments of cacao. Furthermore, the differential expression of pigmentation in the seeds and fruits of cacao, observed at the ICGT, as evidenced in the negative correlations in Fig 2, and as stated by Bartley [96], may be explained by the association of pigmentation of seeds and fruit pigmentation with several genomic regions.
MTAs involving anthocyanin intensity warrant further investigation to determine their value for genomic selection due to the significance of this trait in differentiating among certain genotypes of interest, such as CCN-51 and ancient Nacional cacao [70, 94], the nutraceutical value of anthocyanin and its putative role in cacao disease resistance [34, 95].
Highlights of this study
The results of this study support the observation of Rockman [97] that most complex traits (such as those related to yield in cacao) are controlled by several (putatively interacting) loci with small effects. While some phenotypic traits are controlled by a small number of loci with large effects (as is often the case for traits under biotic selection) [74], others, such as yield, may have more complex genetic architectures. The latter may be controlled by many rare variants, each having a large effect on the phenotype or conversely, many common variants with only small effects on the phenotypes [98]. The causative variants may be clustered in one or a small number of genes or across many genes.
The data on gene ontology presented in Table 4 provide evidence of polygenic control of yield-related traits in cacao. For such traits, it may be more effective to predict the performance of genotypes by using multiple molecular markers [98]. Multilocus mixed linear models (MMLMs) may thus be considered for future studies in cacao, when complex traits are investigated, because these incorporate multiple markers simultaneously as covariates in a stepwise MLM [99].
The observations regarding drought tolerance in this study are of considerable significance since the conditions at the ICGT, where these accessions were conserved, are considered sub-optimal in terms of soil moisture content [44]. Therefore, the latter results, which indicate drought adaptation among genotypes with inherent high yield potential, merit follow-up functional genomic studies.
Once the functional roles of putative genes co-localized with markers with significant associations to traits of interest are elucidated, the effects of relationships of these putative genes with geography and local adaptation must be established, as recommended by McKown et al. [100]. Micheli et al. [101] have reported on functional genomics in cacao focusing on genes expressed under specific physiological conditions. Consistency in QTL effects over different genetic backgrounds must also be established. Individuals identified with favourable marker genotypes and haplotypes may then be used as parental types for enhancement or breeding programmes, such as those involving Multiparent Advanced Generation Intercross Populations (MAGIC), in targeted environments.
Despite the fact that yield is a complex trait, our results on potential genomic selection (GS) for yield traits are very promising, given the high predictive values obtained for these traits, generally superior to 0.5. The detection of several markers associated with yield-related traits with good predictive value, in this study, could facilitate GS and marker-assisted selection (MAS) for yield in cacao.
Conclusion
In this study, carefully collated phenotyping data on traits of economic interest in cacao, such as yield potential (S2 Table http://dx.doi.org/10.13140/RG.2.2.29601.48484), and SNP genotyping data (S10 Table http://dx.doi.org/10.13140/RG.2.2.25826.61123), generated via transcriptome sequencing, were subjected to GWAS. Twenty-nine cacao accessions represent promising material for breeding in terms of yield potential. The results presented herein indicated oligogenic and polygenic control of yield-related traits in cacao. The stringently significant marker-trait associations related to yield, found in this study, were indications of the presence of QTL on chromosomes such as 3, 4, 5, 7, 8 and 9. Chromosome 4 putatively contains a QTL cluster associated with yield-related traits. Some annotated candidate genes associated with seed development, seed lipid accumulation, metabolism and development and plant stress responses (including to drought and to cadmium) have been identified during this research. The identification of yield-related traits with good predictive value could further facilitate GS for yield potential in cacao. The performance of the non-phenotyped individuals at the Trinidad genebank (ICGT) could also be predicted once they are genotyped. This would be particularly useful for the enhanced genotypes (GEBP progeny) [45]. Identification of parents possessing high predictive values and favourable alleles and haplotypes, prior to crossing, should prove beneficial for more rapid development of enhanced cacao progenies.
Supporting information
S1 Table. Background information on the T. cacao L. accessions used in the analyses.
http://dx.doi.org/10.13140/RG.2.2.16179.71202.
https://doi.org/10.1371/journal.pone.0260907.s001
(DOCX)
S2 Table. Phenotypic data for 346 cacao accessions fully phenotyped and used to generate descriptive statistics.
http://dx.doi.org/10.13140/RG.2.2.29601.48484.
https://doi.org/10.1371/journal.pone.0260907.s002
(XLSX)
S3 Table. Results of tests of normality performed on the natural log transformed fruit and seed quantitative traits.
http://dx.doi.org/10.13140/RG.2.2.22890.59849.
https://doi.org/10.1371/journal.pone.0260907.s003
(DOCX)
S4 Table. Descriptive statistics for quantitative fruit and seed traits in wild, cultivated and unclassified cacao germplasm studied.
http://dx.doi.org/10.13140/RG.2.2.36312.37128.
https://doi.org/10.1371/journal.pone.0260907.s004
(DOCX)
S5 Table. ANOVA for yield-related traits.
http://dx.doi.org/10.13140/RG.2.2.17438.00328.
https://doi.org/10.1371/journal.pone.0260907.s005
(DOCX)
S6 Table. Coefficients of membership for clusters of accessions based on STRUCTURE analysis.
http://dx.doi.org/10.13140/RG.2.2.27504.33282.
https://doi.org/10.1371/journal.pone.0260907.s006
(DOCX)
S7 Table. Distances over which linkage disequilibrium decayed to 50 percent over chromosomes 1, 4, 5, 7 and 9.
http://dx.doi.org/10.13140/RG.2.2.20793.44647.
https://doi.org/10.1371/journal.pone.0260907.s007
(DOCX)
S8 Table. Summary of significantly positive marker-trait associations (TASSEL MLM and GAPIT FarmCPU).
http://dx.doi.org/10.13140/RG.2.2.34215.21927.
https://doi.org/10.1371/journal.pone.0260907.s008
(XLSX)
S9 Table. Predictive values (GEBV) of phenotypic traits associated with SNPS.
http://dx.doi.org/10.13140/RG.2.2.12404.83842.
https://doi.org/10.1371/journal.pone.0260907.s009
(DOCX)
S10 Table. Genotype data used in GWAS.
http://dx.doi.org/10.13140/RG.2.2.25826.61123.
https://doi.org/10.1371/journal.pone.0260907.s010
(TXT)
S1 Fig. Summary report for Pod index in wild cacao and Summary report for Pod index in cultivated cacao.
http://dx.doi.org/10.13140/RG.2.2.24148.88966.
https://doi.org/10.1371/journal.pone.0260907.s011
(TIF)
Acknowledgments
The Director of CRC, Prof. Pathmanathan Umaharan, is gratefully acknowledged for endorsing this collaborative research. Useful discussions with Drs. Didier Clément, Christian Cilas and Martijn Ten Hoppen, CIRAD, France, Dr. Michelle End and Mr. R.A. (Tony) Lass, UK and Tricianna Maharaj, Trinidad, are deeply appreciated. J. Bhola, Dr. W. Mollineau, V. Badall, A. Richardson-Drakes, N. Persad, S. Samnarine, C. Jagroop, T. Jugmohan, E. Solozano and other individuals are gratefully recognized for technical assistance in phenotyping at various times during the period of study. The comments of anonymous reviewers are also acknowledged.
Citation: Bekele FL, Bidaisee GG, Allegre M, Argout X, Fouet O, Boccara M, et al. (2022) Genome-wide association studies and genomic selection assays made in a large sample of cacao (Theobroma cacao L.) germplasm reveal significant marker-trait associations and good predictive value for improving yield potential. PLoS ONE 17(10): e0260907. https://doi.org/10.1371/journal.pone.0260907
About the Authors:
Frances L. Bekele
Roles: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing
E-mail: [email protected]
Affiliation: Cocoa Research Centre, The University of The West Indies, St. Augustine, Trinidad and Tobago
https://orcid.org/0000-0002-4345-4358
Gillian G. Bidaisee
Roles: Data curation, Methodology, Writing – review & editing
Affiliation: Cocoa Research Centre, The University of The West Indies, St. Augustine, Trinidad and Tobago
Mathilde Allegre
Roles: Data curation, Formal analysis, Investigation, Methodology, Software, Writing – review & editing
Affiliations Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), UMR AGAP Institute, Montpellier, France, AGAP Institute, CIRAD, INRAE, Institut Agro, University of Montpellier, Montpellier, France
Xavier Argout
Roles: Data curation, Formal analysis, Investigation, Methodology, Validation, Writing – original draft, Writing – review & editing
Affiliations Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), UMR AGAP Institute, Montpellier, France, AGAP Institute, CIRAD, INRAE, Institut Agro, University of Montpellier, Montpellier, France
Olivier Fouet
Roles: Data curation, Formal analysis, Investigation, Methodology, Writing – review & editing
Affiliations Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), UMR AGAP Institute, Montpellier, France, AGAP Institute, CIRAD, INRAE, Institut Agro, University of Montpellier, Montpellier, France
Michel Boccara
Roles: Data curation, Investigation, Methodology, Writing – review & editing
Affiliations Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), UMR AGAP Institute, Montpellier, France, AGAP Institute, CIRAD, INRAE, Institut Agro, University of Montpellier, Montpellier, France
Duraisamy Saravanakumar
Roles: Supervision, Writing – original draft, Writing – review & editing
Affiliation: Department of Food Production, Faculty of Food and Agriculture, The University of the West Indies, St Augustine, Trinidad and Tobago
Isaac Bekele
Roles: Formal analysis, Software, Validation, Visualization, Writing – review & editing
Affiliation: Faculty of Food and Agriculture, The University of the West Indies, St Augustine, Trinidad and Tobago
Claire Lanaud
Roles: Conceptualization, Data curation, Formal analysis, Methodology, Project administration, Supervision, Validation, Writing – original draft, Writing – review & editing
Affiliations Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), UMR AGAP Institute, Montpellier, France, AGAP Institute, CIRAD, INRAE, Institut Agro, University of Montpellier, Montpellier, France
1. Alverson WS, Whitlock BA, Nyffeler R, Bayer C, Baum DA. Phylogeny of the core Malvales: evidence from ndhF sequence data. American Journal of Botany. 1999 Oct;86(10):1474–86. pmid:10523287
2. Expert market research. Expert Market Research Report. 2020. https://www.expertmarketresearch.com/reports/chocolate-market. Accessed August 6, 2020.
3. Argout X, Salse J, Aury JM, Guiltinan MJ, Droc G, Gouzy J et al. The genome of Theobroma cacao. Nature Genetics. 2011 Feb;43(2):101–8. pmid:21186351
4. Cheesman EE. Notes on the nomenclature, classification and possible relationships of cacao populations. Tropical Agriculture. 1944;21(8).
5. Eskes A, Lanaud C. Cocoa. In: Charrier A, Jacquot M, Hamon S, Nicolas D, editors. Tropical Plant Breeding. Montpellier: CIRAD; 2001. p. 78–105.
6. Bekele F, Phillips-Mora W. Cocoa Breeding. In: Al-Khayri JM et al., editors. Advances in Plant Breeding: Industrial and Food Crops, Vol 6. Springer-Verlag, Cham; 2019. p. 409–87. https://doi.org/10.1007/978-3-030-23265-8_12
7. Lanaud C, Fouet O, Legavre T, Lopes U, Sounigo O, Eyango MC et al. Deciphering the Theobroma cacao self-incompatibility system: from genomics to diagnostic markers for self-compatibility. Journal of Experimental Botany. 2017 Oct 13;68(17):4775–90. pmid:29048566
8. Ribeyre F, Sounigo O, Argout X, Cilas C, Efombagn MI, Denis M et al. The genomic selection of Theobroma cacao L: a new strategy of marker assisted selection to improve breeding efficiency and predict useful traits in new populations. International Symposium on Cocoa Research. 2017 Nov 13–17; Lima, Peru. London: ICCO. http://agritrop.cirad.fr/589763/1/ID589763.pdf
9. Simmonds NW. The breeding of perennial crops. Proceedings of the Workshop on the Conservation, Characterisation and Utilization of Cocoa Genetic Resources in the 21st Century;1992 Sep 13–17; Port-of-Spain, Trinidad. Port-of-Spain: The Cocoa Research Unit;1993. p. 156–62.
10. Clément D, Risterucci AM, Motamayor JC, N’Goran J, Lanaud C. Mapping QTL for yield components, vigor, and resistance to Phytophthora palmivora in Theobroma cacao L. Genome. 2003a Apr 1;46(2):204–12. pmid:12723036
11. Araújo IS, de Souza Filho GA, Pereira MG, Faleiro FG, de Queiroz VT, Guimarães CT et al. Mapping of quantitative trait loci for butter content and hardness in cocoa beans (Theobroma cacao L.). Plant Molecular Biology Reporter. 2009 Jun 1;27(2):177–83. https://link.springer.com/content/pdf/10.1007/s11105-008-0069-9.pdf
12. Crouzillat D, Lerceteau E, Pétiard V, Morera J, Rodríguez H, Walker D et al. Theobroma cacao L.: a genetic linkage map and quantitative trait loci analysis. Theoretical and Applied Genetics. 1996 Jul 1;93(1–2):205–14. pmid:24162219
13. Crouzillat D, Ménard B, Mora A, Phillips W, Pétiard V. Quantitative trait analysis in Theobroma cacao using molecular markers. Euphytica. 2000a Jul;114(1):13–23.
14. Crouzillat D, Phillips W, Fritz PJ, Pétiard V. Quantitative trait loci analysis in Theobroma cacao using molecular markers. Inheritance of polygenic resistance to Phytophthora palmivora in two related cacao populations. Euphytica. 2000b Jul;114(1):25–36.
15. N’Goran JA, Risterucci AM, Clément D, Sounigo O, Lorieux M, Lanaud C. Identification of quantitative trait loci (QTL) in Theobroma cacao L. L. Agron Afr. 1997;9:55–63.
16. Lanaud C, Kébé IS, Risterucci AM, Clément D, N’Goran JA, Grivet L et al. Mapping quantitative trait loci (QTL) for resistance to Phytophthora palmivora in T. cacao. Proceedings of the 12th International Cocoa Research Conference; 1996 Nov 17; Bahia, Brazil. Lagos: Cocoa Producers’ Alliance; 1999. p. 99–105.
17. Lanaud C, Boult E, Clapperton J, N’Goran JKA, Cros E, Chapelin M et al. Identification of QTLs related to fat content, seed size an sensorial traits in Theobroma cacao L. Proceedings of the 14th International Cocoa Conference; 2003 Oct; Accra, Ghana. Lagos: Cocoa Producers’ Alliance;2005. p. 1119–26.
18. Lanaud C, Fouet O, Clément D, Boccara M, Risterucci AM, Surujdeo-Maharaj S et al. A meta-QTL analysis of disease resistance traits of Theobroma cacao L. Molecular Breeding. 2009 Nov;24(4):361–74.
19. Flament MH, Kébé I, Clément D, Pieretti I, Risterucci AM, N’Goran JA et al. Genetic mapping of resistance factors to Phytophthora palmivora in cocoa. Genome. 2001 Feb 1;44(1):79–85.
20. Clément D, Risterucci AM, Motamayor JC, N’Goran J, Lanaud C. Mapping quantitative trait loci for bean traits and ovule number in Theobroma cacao L. Genome. 2003b Feb 1;46(1):103–11. pmid:12669802
21. Clément D, Lanaud C, Sabau X, Fouet O, Le Cunff L, Ruiz E, et al. Creation of BAC genomic resources for cocoa (Theobroma cacao L.) for physical mapping of RGA containing BAC clones. Theoretical and Applied Genetics. 2004 May 1;108(8):1627–34. pmid:15235775
22. Risterucci AM, Paulin D, Ducamp M, N’Goran JA, Lanaud C. Identification of QTLs related to cocoa resistance to three species of Phytophthora. Theoretical and Applied Genetics. 2003 Dec 1;108(1):168–74. pmid:13679987
23. Pugh T. Etude du déséquilibre de liaison chez le cacaoyer appartenant aux groupes Criollo/Trinitario. Application au marquage génétique d’intérêt pour la sélection. Thèse Doctorat. Montpellier: Ecole National Supérieur d’Agonomie; 2005.107 p.
24. Pugh T, Fouet O, Risterucci AM, Brottier P, Abouladze M, Delettrez C et al. A new codominant marker-based cocoa linkage map: development and integration of new microsatellite markers into cocoa linkage map. A new cocoa reference map. Proceedings of 14th International Cocoa Research Conference; 2003 Oct 13–18; Accra, Ghana. Lagos: Cocoa Producers’ Alliance; 2005. p. 153–60.
25. Brown JS, Schnell RJ, Motamayor JC, Lopes U, Kuhn DN, Borrone JW. Resistance gene mapping for witches’ broom disease in Theobroma cacao L. in an F2 population using SSR markers and candidate genes. Journal of the American Society for Horticultural Science. 2005 May 1;130(3):366–73.
26. Brown JS, Phillips-Mora W, Power EJ, Krol C, Cervantes-Martinez C, Motamayor JC et al. Mapping QTLs for resistance to frosty pod and black pod diseases and horticultural traits in Theobroma cacao L. Crop Science. 2007 Sep;47(5):1851–8.
27. Brown JS, Sautter RT, Tondo CT, Borrone J, Kuhn D, Motamayor J et al. A composite linkage map from the combination of three crosses made from commercial clones of cacao, T. cacao L. Trop Plant Biol. 2008 Apr 22;1(2):120–30.
28. Faleiro FG, Queiroz VT, Lopes UV, Guimarães CT, Pires JL, Yamada MM et al. Mapping QTLs for witches’ broom (Crinipellis perniciosa) resistance in cacao (Theobroma cacao L.). Euphytica. 2006 May;149(1):227–35.
29. Argout X, Martin G, Droc G, Fouet O, Labadie K, Rivals E et al. The cacao Criollo genome v2. 0: an improved version of the genome for genetic and functional genomic studies. BMC Genomics. 2017 Dec 1;18(1):730. pmid:28915793
30. Saski CA, Feltus FA, Staton ME, Blackmon BP, Ficklin SP, Kuhn DN et al. A genetically anchored physical framework for Theobroma cacao cv. Matina 1–6. BMC Genomics. 2011 Dec;12(1):413–25. pmid:21846342
31. Feltus FA, Saski CA, Mockaitis K, Haiminen N, Parida L, Smith Z et al. Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes. BMC Genomics. 2011 Dec;12(1):1–6. pmid:21794110
32. Fouet O, Allegre M, Argout X, Jeanneau M, Lemainque A, Pavek S et al. Structural characterization and mapping of functional EST-SSR markers in Theobroma cacao. Tree Genetics & Genomes. 2011 Aug;7(4):799–817.
33. Allegre M, Argout X, Boccara M, Fouet O, Roguet Y, Bérard AU et al. Discovery and mapping of a new expressed sequence tag-single nucleotide polymorphism and simple sequence repeat panel for large-scale genetic studies and breeding of Theobroma cacao L. DNA Research. 2012 Feb 1;19(1):23–35. pmid:22210604
34. Argout X, Fouet O, Wincker P, Gramacho K, Legavre T, Sabau X et al. Towards the understanding of the cocoa transcriptome: Production and analysis of an exhaustive dataset of ESTs of Theobroma cacao L. generated from various tissues and under various conditions. BMC Genomics. 2008 Dec 1;9(1):512. pmid:18973681
35. Akhunov E, Nicolet C, Dvorak J. Single nucleotide polymorphism genotyping in polyploid wheat with the Illumina GoldenGate assay. Theoretical and Applied Genetics. 2009 Aug 1;119(3):507–17. https://link.springer.com/content/pdf/10.1007/s00122-009-1059-5.pdf pmid:19449174
36. Myles S, Peiffer J, Brown PJ, Ersoz ES, Zhang Z, Costich DE et al. Association mapping: critical considerations shift from genotyping to experimental design. The Plant Cell. 2009 Aug 1;21(8):2194–202. http://www.plantcell.org/content/plantcell/21/8/2194.full.pdf pmid:19654263
37. Breseghello F, Sorrells ME. Association analysis as a strategy for improvement of quantitative traits in plants. Crop Science. 2006 May;46(3):1323–30.
38. Li H, Bradbury P, Ersoz E, Buckler ES, Wang J. Joint QTL linkage mapping for multiple-cross mating design sharing one common parent. PloS One. 2011 Mar 15;6(3):e17573. pmid:21423655
39. Hill WG, Robertson A. Linkage disequilibrium in finite populations. Theoretical and Applied Genetics. 1968 Jun;38(6):226–31. pmid:24442307
40. Stack JC, Royaert S, Gutiérrez O, Nagai C, Holanda IS, Schnell R et al. Assessing microsatellite linkage disequilibrium in wild, cultivated, and mapping populations of Theobroma cacao L. and its impact on association mapping. Tree Genetics & Genomes. 2015 Apr 1;11(2):19.
41. Bekele FL, Bekele I, Butler DR, Bidaisee GG. Patterns of morphological variation in a sample of cacao (Theobroma cacao L.) germplasm from the International Cocoa Genebank, Trinidad. Genetic Resources and Crop Evolution. 2006 Aug;53(5):933–48.
42. Motamayor JC, Lachenaud P, Da Silva e Mota JW, Loor R, Kuhn DN, Brown JS et al. Geographic and genetic population differentiation of the Amazonian chocolate tree (Theobroma cacao L). PloS One. 2008 Oct 1;3(10):e3311. pmid:18827930
43. Bekele F, Butler DR. Proposed short list of cocoa descriptors for characterization. Working procedures for cocoa germplasm evaluation and selection. Proceedings of the CFC/ICCO/IPGRI Project Workshop; 1998 Feb 1–6; Montpellier, France. Rome: International Plant Genetic Resources Institute (IPGRI); 2000.p. 41–8.
44. Bekele FL, Bidaisee GG, Singh H, Saravanakumar D. Morphological characterisation and evaluation of cacao (Theobroma cacao L.) in Trinidad to facilitate utilisation of Trinitario cacao globally. Genetic Resources and Crop Evolution. 2020a Mar;67(3):621–43.
45. Bekele F, Bidaisee G, Saravanakumar D. Examining phenotypic diversity and economic value of cacao (Theobroma cacao L.) conserved at the International Cocoa Genebank, Trinidad to support improvement in cocoa yield globally. Tropical Agriculture. 2020b (released 2021 Feb 25);97(2). https://journals.sta.uwi.edu/ojs/index.php/ta/article/view/7970
46. Iwaro AD, Bekele FL, Butler DR. Evaluation and utilisation of cacao (Theobroma cacao L.) germplasm at the International Cocoa Genebank, Trinidad. Euphytica. 2003 Mar;130(2):207–21.
47. R Core Team. R: A Language and Environment for Statistical Computing. 887 R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org. 2017; 888.
48. Zhao K, Aranzana MJ, Kim S, Lister C, Shindo C, Tang C et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 2007 Jan 19;3(1):e4. pmid:17238287
49. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000 Jun 1;155(2):945–59. pmid:10835412
50. Pritchard JK, Donnelly P. Case-control studies of association in structured or admixed populations. Theoretical Population Biology. 2001 Nov 1;60(3):227–37. pmid:11855957
51. Pritchard JK, Wen W, Falush D. Documentation for STRUCTURE software: Version 2.3. University of Chicago, Chicago, IL. 2010 Feb 2:1–37. http://pritch.bsd.uchicago.edu/structure.html
52. Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology. 2005 Jul;14(8):2611–20. pmid:15969739
53. Hubisz MJ, Falush D, Stephens M, Pritchard JK. Inferring weak population structure with the assistance of sample group information. Molecular Ecology Resources. 2009 Sep;9(5):1322–32. pmid:21564903
54. Perrier X, Jacquemoud-Collet JP. DARwin software. 2006. http://darwin.cirad.fr/
55. Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007 Oct 1;23(19):2633–5. pmid:17586829
56. Liu Xiaolei, Huang Meng, Fan Bin, Buckler Edward S., and Zhang Zhiwu. Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genetics. 2016 Feb 1;12(2):e1005767. pmid:26828793
57. Henderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975 Jun 1:423–47.
58. Endelman JB, Jannink JL. Shrinkage estimation of the realized relationship matrix. G3: Genes| Genomes| Genetics. 2012 Nov 1;2(11):1405–13. pmid:23173092
59. Cover T, Hart P. Nearest neighbor pattern classification. IEEE Transactions on information theory. 1967 Jan;13(1):21–7.
60. Gao X, Becker LC, Becker DM, Starmer JD, Province MA. Avoiding the high Bonferroni penalty in genome-wide association studies. Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology Society. 2010 Jan;34(1):100–5. pmid:19434714
61. Utro F, Haiminen N, Livingstone D, Cornejo OE, Royaert S, Schnell RJ et al. iXora: exact haplotype inferencing and trait association. BMC Genetics. 2013 Dec;14(1):1–5. pmid:23742238
62. Gutiérrez-López N, Ovando-Medina I, Salvador-Figueroa M, Molina-Freaner F, Avendaño-Arrazate CH, Vázquez-Ovando A. Unique haplotypes of cacao trees as revealed by trnH-psbA chloroplast DNA. PeerJ. 2016 Apr 7;4:e1855. pmid:27076998
63. Marcano M, Pugh T, Cros E, Morales S, Páez EA, Courtois B et al. Adding value to cocoa (Theobroma cacao L.) germplasm information with domestication history and admixture mapping. Theoretical and Applied Genetics. 2007 Mar 1;114(5):877–84. pmid:17252253
64. Motilal LA, Zhang D, Mischke S, Meinhardt LW, Boccara M, Fouet O et al. Association mapping of seed and disease resistance traits in Theobroma cacao L. Planta. 2016 Dec;244(6):1265–76. pmid:27534964
65. Motilal LA, Sounigo O, Thévenin JM, Risterucci AM, Pieretti I, Noyer JL et al. Theobroma cacao L.: genome map and QTLs for Phytophthora palmivora resistance. Towards the effective and optimum promotion of cocoa through research and development. Proceedings of the 13th International Cocoa Research Conference; 2000 Oct 9–14; Kota Kinabalu, Malaysia. Lagos: Cocoa Producers’ Alliance; 2001. p. 111–17.
66. Queiroz VT, Guimarães CT, Ahnert D, Schuster I, Daher RT, Pereira MG et al. Identification of a major QTL in cocoa (Theobroma cacao L.) associated with resistance to Witches’ Broom disease. Plant Breeding. 2003 Jun;122(3):268–72.
67. Royaert S, Phillips-Mora W, Leal AM, Cariaga K, Brown JS, Kuhn DN et al. Identification of marker-trait associations for self-compatibility in a segregating mapping population of Theobroma cacao L. Tree Genetics & Genomes. 2011 Dec;7(6):1159–68.
68. Royaert S, Jansen J, da Silva DV, de Jesus Branco SM, Livingstone DS, Mustiga G et al. Identification of candidate genes involved in Witches’ Broom disease resistance in a segregating mapping population of Theobroma cacao L. in Brazil. BMC Genomics. 2016 Dec;17(1):107. pmid:26865216
69. Sounigo O, Efombagn B, Lemainque A et al. Association mapping on cocoa: a way to identify functional SSR markers linked to yield, tolerance to black pod and mirids assessed in Cameroon and develop a marker assisted breeding programme. Proceedings of the 16th International Cocoa Research Conference; 2009 Nov 16–21; Bali, Indonesia. Lagos: Cocoa Producers’ Alliance; 2012. p.153-8.
70. Motamayor JC, Mockaitis K, Schmutz J, Haiminen N, Livingstone D III, Cornejo O et al. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color. Genome Biology. 2013 Jun;14(6):1–25. pmid:23731509
71. Da Silva MR, Clément D, Gramacho KP, Monteiro WR, Argout X, Lanaud C et al. Genome-wide association mapping of sexual incompatibility genes in cacao (Theobroma cacao L.). Tree Genetics & Genomes. 2016 Jun;12(3):1–3.
72. Osorio-Guarín JA, Berdugo-Cely JA, Coronado-Silva RA, Baez E, Jaimes Y, Yockteng R. Genome-wide association study reveals novel candidate genes associated with productivity and disease resistance to Moniliophthora spp. in cacao (Theobroma cacao L.). G3: Genes, Genomes, Genetics. 2020 May 1;10(5):1713–25. pmid:32169867
73. Semagn K, Bjørnstad Å, Xu Y. The genetic dissection of quantitative traits in crops. Electronic Journal of Biotechnology. 2010 Sep;13(5):16–7. https://scielo.conicyt.cl/pdf/ejb/v13n5/a16.pdf
74. Mir RR, Choudhary N, Bawa V, Jan S, Singh B, Ashraf Bhat M et al. Allelic diversity, structural analysis and genome-wide association study (GWAS) for yield and related traits using unexplored common bean (Phaseolus vulgaris L.) germplasm from Western Himalayas. Frontiers in Genetics. 2021;11:1797. pmid:33584807
75. Korte A, Farlow A. The advantages and limitations of trait analysis with GWAS: a review. Plant Methods. 2013 Dec;9(1):1–9. pmid:23876160
76. Caspari E. Pleiotropic gene action. Evolution. 1952 March; 6:1–18. https://www.jstor.org/stable/2405500
77. Yeaman S. Genomic rearrangements and the evolution of clusters of locally adaptive loci. Proceedings of the National Academy of Sciences. 2013 May 7;110(19):E1743–51. Available from: pmid:23610436
78. Elwers S, Zambrano A, Rohsius C, Lieberei R. Histological features of phenolic compounds in fine and bulk cocoa seed (Theobroma cacao L.). J Appl Bot Food Qual. 2010 Sep 1;83(2):182–8.
79. Bucheli P, Rousseau G, Alvarez M, Laloi M, McCarthy J. Developmental variation of sugars, carboxylic acids, purine alkaloids, fatty acids, and endoproteinase activity during maturation of Theobroma cacao L. seeds. Journal of Agricultural and Food Chemistry. 2001 Oct 15;49(10):5046–51. pmid:11600064
80. Mustiga GM, Morrissey J, Stack JC, DuVal A, Royaert S, Jansen J et al. Identification of climate and genetic factors that control fat content and fatty acid composition of Theobroma cacao L. beans. Frontiers in Plant Science. 2019 Oct 14;10:1159. pmid:19654263
81. Amin I, Jinap S, Jamilah B, Harikrisna K, Biehl B. Analysis of vicilin (7S)-class globulin in cocoa cotyledons from various genetic origins. Journal of the Science of Food and Agriculture. 2002 May 15;82(7):728–32.
82. Jako C, Kumar A, Wei Y, Zou J, Barton DL, Giblin EM et al. Seed-specific over-expression of an Arabidopsis cDNA encoding a diacylglycerol acyltransferase enhances seed oil content and seed weight. Plant Physiology. 2001 Jun 1;126(2):861–74. pmid:11402213
83. Fritz PJ, Fritz KA, Kauffman JM, Patterson GR, Robertson CA, Stoesz DA et al. Cocoa seeds: changes in protein and polysomal RNA during development. Journal of Food Science. 1985 Jul;50(4):946–50. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1365-2621.1985.tb12986.x
84. Lung SC, Weselake RJ. Diacylglycerol acyltransferase: a key mediator of plant triacylglycerol synthesis. Lipids. 2006 Dec;41(12):1073–88. pmid:17269553
85. Sørensen BM, Furukawa-Stoffer TL, Marshall KS, Page EK, Mir Z, Forster RJ et al. Storage lipid accumulation and acyltransferase action in developing flaxseed. Lipids. 2005 Oct;40(10):1043–9. pmid:16382576
86. Perry HJ, Harwood JL. Changes in the lipid content of developing seeds of Brassica napus. Phytochemistry. 1993 Jan 1;32(6):1411–5.
87. Marcano M, Morales S, Hoyer MT, Courtois B, Risterucci AM, Fouet O et al. A genomewide admixture mapping study for yield factors and morphological traits in a cultivated cocoa (Theobroma cacao L.) population. Tree Genetics & Genomes. 2009 Apr 1;5(2):329–37.
88. dos Santos Fernandes L, Correa FM, Ingram KT, de Almeida AA, Royaert S. QTL mapping and identification of SNP-haplotypes affecting yield components of Theobroma cacao L. Horticulture Research. 2020 Mar 1;7(1):1–8. pmid:32140235
89. Doebley JF, Gaut BS, Smith BD. The molecular genetics of crop domestication. Cell. 2006 Dec 29;127(7):1309–21. pmid:17190597
90. Bhat JA, Ali S, Salgotra RK, Mir ZA, Dutta S, Jadon V et al. Genomic selection in the era of next generation sequencing for complex traits in plant breeding. Frontiers in Genetics. 2016 Dec 27;7:221. https://www.frontiersin.org/articles/10.3389/fgene.2016.00221/full pmid:28083016
91. Mangini G, Blanco A, Nigro D, Signorile MA, Simeone R. Candidate genes and quantitative trait loci for grain yield and seed size in Durum Wheat. Plants 2021;10:312–28. pmid:33562879
92. Sosso D, Luo D, Li QB, Sasse J, Yang J, Gendrot G et al. Seed filling in domesticated maize and rice depends on SWEET-mediated hexose transport. Nature Genetics. 2015 Dec;47(12):1489. pmid:26523777
93. Falcone Ferreyra ML, Rius S, Casati P. Flavonoids: biosynthesis, biological functions, and biotechnological applications. Frontiers in Plant Science. 2012 Sep 28;3:222. pmid:23060891
94. Devy L, Anita-Sari I, Saputra TI, Susilo AW, Wachjar A. Identification of molecular marker based on MYB Transcription Factor for the selection of Indonesian Fine Cacao (Theobroma cacao L.). Pelita Perkebunan (a Coffee and Cocoa Research Journal). 2018 Aug 31;34(2):59–68.
95. Liu Y, Shi Z, Maximova SN, Payne MJ, Guiltinan MJ. Tc-MYBPA is an Arabidopsis TT2-like transcription factor and functions in the regulation of proanthocyanidin synthesis in Theobroma cacao. BMC Plant Biology. 2015 Dec;15(1):1–6. pmid:26109181
96. Bartley BGD. The genetic diversity of cacao and its utilization. The genetic diversity of cacao and its utilization. Wallingford: CABI Publishing; 2005.
97. Rockman MV. The QTN program and the alleles that matter for evolution: all that’s gold does not glitter. Evolution: International Journal of Organic Evolution. 2012 Jan;66(1):1–7. pmid:22220860
98. Lee S, Abecasis GR, Boehnke M, Lin X. Rare-variant association analysis: study designs and statistical tests. The American Journal of Human Genetics. 2014 Jul 3;95(1):5–23. pmid:24995866
99. Qi Z, Song J, Zhang K, Liu S, Tian X, Wang Y et al. Identification of QTNs controlling 100-seed weight in soybean using multilocus genome-wide association studies. Frontiers in Genetics. 2020 Jul 16;11:689. pmid:32765581
100. McKown AD, Klápště J, Guy RD, Geraldes A, Porth I, Hannemann J et al. Genome-wide association implicates numerous genes underlying ecological trait variation in natural populations of Populus trichocarpa. New Phytologist. 2014 Jul;203(2):535–53. pmid:24750093
101. Micheli F, Maximova S, Gramacho KP, Guiltinan M, Wilkinson MJ, Lanaud C et al. Functional genomics of cacao. In: Jean-Claude K, Michel D, editors. Advances in Botanical Research, Chapter 3. London: Academic Press; 2010. p.119–177.
102. Li N, Xu R, Li Y. Molecular networks of seed size control in plants. Annual Review of Plant Biology. 2019 Apr 29;70:435–63. (Table 4). pmid:30795704
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2022 Bekele et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
A genome-wide association study (GWAS) was undertaken to unravel marker-trait associations (MTAs) between SNP markers and phenotypic traits. It involved a subset of 421 cacao accessions from the large and diverse collection conserved ex situ at the International Cocoa Genebank Trinidad. A Mixed Linear Model (MLM) in TASSEL was used for the GWAS and followed by confirmatory analyses using GAPIT FarmCPU. An average linkage disequilibrium (r2) of 0.10 at 5.2 Mb was found across several chromosomes. Seventeen significant (P ≤ 8.17 × 10−5 (–log10 (p) = 4.088)) MTAs of interest, including six that pertained to yield-related traits, were identified using TASSEL MLM. The latter accounted for 5 to 17% of the phenotypic variation expressed. The highly significant association (P ≤ 8.17 × 10−5) between seed length to width ratio and TcSNP 733 on chromosome 5 was verified with FarmCPU (P ≤ 1.12 × 10−8). Fourteen MTAs were common to both the TASSEL and FarmCPU models at P ≤ 0.003. The most significant yield-related MTAs involved seed number and seed length on chromosome 7 (P ≤ 1.15 × 10−14 and P ≤ 6.75 × 10−05, respectively) and seed number on chromosome 1 (P ≤ 2.38 × 10−05), based on the TASSEL MLM. It was noteworthy that seed length, seed length to width ratio and seed number were associated with markers at different loci, indicating their polygenic nature. Approximately 40 candidate genes that encode embryo and seed development, protein synthesis, carbohydrate transport and lipid biosynthesis and transport were identified in the flanking regions of the significantly associated SNPs and in linkage disequilibrium with them. A significant association of fruit surface anthocyanin intensity co-localised with MYB-related protein 308 on chromosome 4. Testing of a genomic selection approach revealed good predictive value (genomic estimated breeding values (GEBV)) for economic traits such as seed number (GEBV = 0.611), seed length (0.6199), seed width (0.5435), seed length to width ratio (0.5503), seed/cotyledon mass (0.6014) and ovule number (0.6325). The findings of this study could facilitate genomic selection and marker-assisted breeding of cacao thereby expediting improvement in the yield potential of cacao planting material.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer