Crop plant diversity managed by genebanks is of great value in the context of the changing needs of agriculture (Smale & Jamora, 2020), but genetic and phenotypic information on this diversity is insufficiently available for most genebanks (McCouch et al., 2013, 2020). The advent of Next Generation Sequencing has enabled—at an ever-decreasing cost—the sequencing of reference genomes of many crops as well as high-density genotyping for large numbers of samples per crop. Genotyping is a powerful tool to help identify gaps or redundancies in germplasm collections, and when combined with phenotyping data, can be used to detect correlations between genome regions and agronomic traits. For some crops, massive sequencing and data processing have been undertaken, as shown in the rice, wheat and barley germplasm collections (Milner et al., 2019; Sansaloni et al., 2020; Wang et al., 2018). These approaches represent increasingly reachable targets for many genebanks worldwide, including the CGIAR international collections (Halewood, Lopez Noriega et al., 2018).
For bananas (Musa spp.), the largest ex situ collection is maintained in vitro at one of the CGIAR international genebanks, the International Musa Germplasm Transit Centre (ITC), comprised of more than 1,600 accessions (Van den houwe et al., 2020). Then, over 60 national collections worldwide conserve banana diversity and conduct-related research (Figure 1). Bananas (including Plantains) are arguably the world's most important fresh fruit and are a major staple food for hundreds of millions of people in low-income countries. With an estimated world production of 158 million tons annually, the volume of gross banana exports is worth US$12.8 billion to exporting countries (FAOSTAT 2019). Furthermore, most of the global production is by smallholders for their own consumption or for local trade, making it the fourth-most important food crop in the least developed countries (LDCs) as defined by the United Nations, ranked by total production and food consumption.
FIGURE 1. Diversity of banana bunches at a germplasm collection exhibited at the National Research Centre for Banana (NRCB) in Trichy, India (with genebank curators at the back). Photograph taken by Julie Sardos
In order to increase understanding of its complex genetics so as to boost crop improvement, the first whole banana genome sequence was released in 2012, for an accession belonging to the Musa acuminata species (D’Hont et al., 2012) (Table 1). This original reference has recently been supplemented with of a number of other Musa species (Rouard et al., 2018; Wang et al., 2019; Wu et al., 2016). In parallel, high-throughput genotyping methods (i.e., genotyping-by-sequencing (GBS) (Elshire et al., 2011) and restriction-site associated DNA markers (RADSeq) (Davey et al., 2010)) have been used to investigate single nucleotide polymorphisms (SNPs) on various panels of accessions available at the ITC genebank (Cenci et al., 2020; Sardos et al., 2016). In addition, other SNP datasets have been generated from gene expression and proteomics experiments for subsets related to drought tolerance (Cenci et al., 2019; van Wesemael et al., 2019).
TABLE 1 An overview of banana
| Categories | Description |
| Geographic origin | South-East Asia and West Oceania |
| Geographic distribution | Humid tropics and subtropics |
| Total global production | >158 million tons (FAOSTAT, 2019) |
| Taxonomy | ~75 species and 500–1000 cultivars - |
| Biology | A giant herb belonging to monocots |
| Vegetatively propagated and perennial | |
| Parthenocarpic and low fertile cultivars | |
| Ploidy | Diploid, triploid and tetraploid |
| Basic genome information | 11 chromosomesa |
| 550–600 million of base pairs | |
| Approx. 35,000 genes | |
| Common uses | Dessert, cooking, beer, textile, medicine |
| Nutrition | Rich source of carbs, fiber, potassium, vitamin B6, vitamin C. Some varieties are rich in carotenoids |
| Main breeding objectives | Drought tolerance, Disease resistance (e.g., Fusarium wilt, Black Leaf streak, banana bacterial wilt (BXW)), Biofortification (e.g., ProVitA), and post-harvest traits (texture, flavor) |
aHaploid genome of most cultivated bananas and their crop wild relatives in the Eumusa section. Chromosome number can vary n = 7 (M. ingentimusa), n = 9 (M. becarri), and n = 10 (Callimusa section).
While genetic variant information is being produced at a fast pace through various projects and is increasingly processed via standardized bioinformatics workflows, one of the main challenges is the management of an increasing volume of raw and intermediate files that are difficult to handle for many applications. Bioinformatics workflows can produce millions of markers but need to be filtered in multiple ways according to analysis type or user perspective, and working with these data often presents challenges to those without capacity in bioinformatics. Online information systems coping with big data linked to germplasm collections are scarce (König et al., 2020; Mansueto et al., 2017; Raubach et al., 2020; Ruas et al., 2017). Moreover, lack of access to phenotypic information continues to be an additional factor limiting the use of plant genetic resources. Phenotypic data are complex—information on the context under which they were collected is indispensable, and the domain is continuously evolving (Germeier & Unger, 2019). Recognizing these challenges, the availability of easy-to-use, interoperable and flexible solutions to navigate high-density genotyping and phenotyping data online continues to be a key aim for genebanks’ delivery of their mission of germplasm documentation and utilization.
In this study, we present an approach used to generate, store and disseminate a catalog of genetic variants of banana and plantain maintained in the ITC, which is available at
Material used to create the catalog mostly originates from lyophilized leaf tissues of young banana plants distributed by the ITC. Such tissue is the most convenient way to obtain DNA of an acceptable quality and quantity for high-throughput restriction enzyme-associated DNA sequencing methods, as for other omics techniques (Carpentier et al., 2007). Another advantage is that once in stock, the tissues are readily available for distribution, whereas in vitro material takes longer to obtain (i.e., an average of 2 months for proliferating tissues and 4 months for in vitro rooted plantlets).
The generated sequence—short reads from Illumina sequencing machines—was processed through bioinformatics workflows composed of open source software that includes quality checks, read mapping on reference genomes, SNP calling and variant effect in genic regions as described in Sardos et al., 2016; Cenci et al., 2020 and Eyland et al., 2020. The outputs of the workflow are enormous text files in the variant call format (VCF). For every accession, another specific file format (i.e., gVCF) containing the full list of variant and non-variant sites is backed up on a server, allowing the system to recall variants with different sampling whenever necessary, thus saving significant time and computing resources.
SNP datasets for the ITC and other collections published in the literature (VCF files) (Table 2) were recorded in a non-relational database browsable via a web application called GIGWA (Sempéré et al., 2019), developed for the purpose of searching large genotyping datasets in an optimized manner. This system, easy to deploy on any platform, is species-agnostic and provides a user-friendly interface to perform advanced data filtering and export for third-party analytical software. It was seamlessly embedded in the Musa Germplasm Information System (MGIS
TABLE 2 List of studies and associated banana accessions available
| Purpose of the study/Dataset | Number of accessions (samples) | Collection | Taxonomical coverage | Number of markers | Sequencing technologies | References | Sequence data availability |
| GWAS panel for parthenocarpy and sterility | 106 | ITC | M. acuminata (wild) and AA (cultivated) | 7,079,397 | Genotyping by Sequencing (GBS) | (Sardos et al., 2016) | PRJNA305234 |
| Genome constitution of ABB, AB | 83 | ITC | ABB, M. acuminata and M. balbisiana | 683,264 | RAD sequencing | (Cenci et al., 2020) | PRJNA450532 |
| Genome constitution of AAB | 118 | ITC | AAB, M. acuminata and M. balbisiana | 3,315,168 | RAD sequencing | Publication in preparation | PRJNA450532 |
| Panel evaluated for drought tolerance | 10 (60) | ITC | AAA, AAB, ABB | 6,951,307 | RNA sequencing | (Cenci et al., 2019) | PRJNA305241 |
| Domestication and acuminata subspecies | 254 | ITC | M. acuminata and AA | 245,285 | RAD sequencing | (Sardos et al., 2021) | PRJNA450532 |
| A/B Structural variations | 207 | CRB-PT | Breeding population | 148,329 | RAD sequencing | (Baurens et al., 2019) | PRJNA448968 PRJEB28077 |
| Genome ancestry mosaics | 25 | CRB-PT, CARBAP | M. acuminata, AA, AAA | 191,876 | RNA sequencing | (Martin, Cardi, et al., 2020) | SRR956987 |
| Chromosome reciprocal translocations | 155 | CRB-PT, ITC | Breeding populations, M. acuminata, AA | 120,111 | Genotyping by Sequencing (GBS) | (Martin, Baurens, et al., 2020) | PRJNA667853 |
The diversity of edible bananas has been classified using genome groups according to the relative contribution of their ancestral wild species. Most cultivated bananas derived from hybridization between Musa acuminata (A genome) and the Musa balbisiana (B genome) species and the most frequent genome combinations are diploids and triploids cultivars denoted: AA, AB, AAA, AAB and ABB. The current catalog of genetic variants spans these species/groups for selected subsets of accessions (Table 2). It offers access to datasets with sizes ranging from 245,285 to more than 7 million SNPs depending on the study.
While the system is optimized to explore a large volume of data, it enables efficient filtering options based on a full range of parameters, mostly genetics (e.g., chromosome location, missing data percentage, minor allele frequency, gene mutation effect) but not only. Accession details can be enriched with metadata such as passport data or agronomic traits (e.g., control vs. stress on gene expression analyses), which then become elements which can be filtered. The interface is designed to work with one or two groups of samples, a feature which, when the latter case is used in conjunction with genotype pattern filters, makes it straightforward to identify SNPs discriminating the groups (Figure 2). This is particularly useful to filter by taxonomy or a certain trait between contrasted genotypes to reveal unique alleles held by some accessions. From the user interface, genetic variants of the catalog can be exported in various popular formats (e.g., VCF, BED) for further analyses, or directly imported in other software for genetics analyses. Alternately, content can be programmatically accessed with the Breeding API (BrAPI), a computer–computer programming interface following standard plant specifications (Selby et al., 2019). This solution facilitates essential connections with other information systems (e.g., as implemented in Musabase, the database for banana breeding data
FIGURE 2. Example of a genotyping study in the Musa Germplasm Information System (MGIS). (a) Each study can be opened to access the list of genebank accessions with passport data and a genetic tree. (b) Web interface listing the 649 SNP discriminating between 26 wild banana accessions and cultivated Pisang Jari Buya bananas. (c) Detail view of genetic variants by accession with focus on ITC0249 ‘Calcutta 4’
With regard to types of use of the catalog, it can support various types of research analyses from genetic diversity studies to gene trait association. Of particular interest, a set of SNP markers for a panel of 105 accessions were investigated to provide genebank users with genetic datasets ready for genome-wide association studies (GWAS) once phenotyping data are obtained (Sardos et al., 2016).
The concern that such high levels of genotypic and phenotypic information, associated with germplasm accessions, would enable new breeding techniques (NBTs) that would bypass the access and benefit-sharing (ABS) arrangements linked currently to the distribution of physical material has generated much recent attention (Aubry, 2019; Halewood, Chiurugwi et al., 2018; Smyth et al., 2020). At the moment, any genebank user (e.g., researcher, breeder) can order plants and sequence them without further obligations, and many organizations have already made publicly available such datasets for a wide range of crops. As potential solutions are elaborated (Scholz et al., 2020), an important and challenging crop to breed such as banana should not be ignored, as access to its genetic and phenotypic data may contribute significantly to its progress as a crop (Gaffney et al., 2020).
This catalog intends to provide open access to genomic resources in an equitable way, ultimately benefiting all, including those in low-income countries (Halewood et al., 2017). It should be noted that it does not include gene functions, but is linked to a genome browser from the banana genome hub which contains gene annotation for references banana genomes (Droc et al., 2013). Nevertheless, given that banana is not a model plant, most gene functional information are inferred by bioinformatics methods (e.g., homology-based prediction methods). Moreover, given the polyploid genome background of numerous banana cultivars, it may be expected that many agronomic traits are under complex genetic regulation control, necessitating innovative approaches to investigate the role of apparent gene redundancy (Cenci et al., 2019; D’Hont et al., 2012). We have not yet reached the stage where one can easily pick up a gene variant coding for a specific trait and select material of interest to conduct crop improvement. Significant research is still needed to better understand the physiology and genetic architecture of traits in banana. Phenotyping experiments for various traits, including fruit quality, are also still missing, which may inhibit adoption of improved hybrids (Thiele et al., 2020). Furthermore, new plant breeding techniques such as gene editing will have to be fine-tuned for banana, even if some encouraging perspectives have been recently published (Tripathi et al., 2019; Zorrilla-Fontanesi et al., 2020). Finally, regulation frameworks of edited crops are still to be legislated in many countries (Schmidt et al., 2020). While waiting for future policy options, training on the use of such catalogs should be strengthened, particularly for breeders in national programs in those low-income countries with supportive funding schemes.
CONCLUSIONSA digital catalog of genetic variants is available for banana and is directly linked to the diversity held in the ITC genebank. It is accessible online as a proof of concept for exploration and export of SNP datasets. We adapted the system with the objective of keeping the genetic information connected to the physical material maintained in the genebank. Users can browse genetic information, identify interesting material and order it online for further investigation and use in breeding programs. While many genebanks are wondering if managing high-density markers is in their scope, the GIGWA web application offers a simple and elegant solution. With a reasonable transaction cost, its framework can be extrapolated to any germplasm collection.
Challenges still need to be addressed. First, on a technical side, the datasets are stored by clusters of accessions resulting from individual studies. Merging datasets from various projects and sequencing platforms is a challenging task. From a financial perspective, funder investment is needed to complete the genotyping of the whole collection. Given the relatively small size of the clonal collection of bananas (1,600 accessions compared to the 773,000 accessions managed by the CGIAR collections in total), it would not require a massive investment. Finally, to comply with international rules, further developments will have to take into account the agreements on digital sequence information (DSI) and on access and benefit-sharing (ABS) that are currently being debated in the frameworks of the Convention on Biological Diversity and the International Treaty for Plant Genetic Resources for Food and Agriculture.
ACKNOWLEDGMENTSWe thank the CGIAR Research Program, Roots, Tubers and Bananas (RTB) and the Directorate-general Development Cooperation and Humanitarian from the Belgian Development Cooperation for their financial support.
AUTHORS' CONTRIBUTIONSM.R. led the writing of the manuscript with critical inputs from J.S. N.R. and S.C.C. I.V.D.H facilitated distribution of plant material. J.S. and M.R. coordinated genotyping sequence production and S.C.C. and M.R. coordinated transcriptomics data sequence production. C.B. and M.R. performed bioinformatics analyses and data management. G.S. and V.G. developed and deployed software for data management. N.R. acquired the project funding and supervised teamwork. All authors contributed to the draft and gave final approval for publication.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Societal Impact Statement
Global production of bananas, among the top 10 food crops worldwide, is under threat. Increasing the use of germplasm conserved in genebanks is crucial. However, the lack of or difficult access to genetic diversity information limits the efficient utilization of these valuable resources. Here, we present a digital catalog of high‐density markers for banana germplasm conserved at the international banana collection. By facilitating access to subsets of genetic diversity information, the catalog has potential to maximize conservation and use of climate‐ready varieties and to optimize breeding strategies. The catalog is extendable with data from any banana collection and the software is easily deployable in other crop genebanks.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
; Sardos, Julie 1 ; Sempéré, Guilhem 2 ; Breton, Catherine 1
; Guignon, Valentin 1 ; Ines Van den Houwe 3 ; Carpentier, Sebastien C 4 ; Roux, Nicolas 1 1 Bioversity International, Parc Scientifique Agropolis II, Montpellier, France
2 CIRAD, UMR INTERTRYP, Montpellier, France; INTERTRYP, Univ Montpellier, CIRAD, Montpellier, France
3 Bioversity International, Leuven, Belgium
4 Bioversity International, Leuven, Belgium; Laboratory of Tropical Crop Improvement, Division of Crop Biotechnics, Leuven, Belgium




