Content area
The order Myrtales, one of the most species-rich lineage within the Superrosidae clade, with the majority of its species distributed across five families: Myrtaceae, Melastomataceae, Lythraceae, Onagraceae, and Combretaceae. Despite the ecological and economic importance of the Myrtales, its phylogenetic relationships remain unresolved, with previous studies yielding inconsistent results based on gene fragments and plastid genomes. Genomic data, particularly single-copy/low-copy nuclear genes, provide valuable insights for resolving these phylogenetic relationships. However, phylogenetic studies still lack sufficient clade coverage, particularly for less studied families such as the Onagraceae. Epilobieae is an important tribe of the Onagraceae, characterized by substantial chromosomal number variation and whole genome duplication event (WGD). Nevertheless, the mechanisms of chromosomal evolution remain unresolved. Here, we sequenced and assembled the genomes of two representative species from the Epilobieae, Chamerion angustifolium (formerly Chamaenerion angustifolium) and Epilobium hirsutum, with genome sizes of 636.59 Mb and 400.23 Mb, respectively. Genome evolution analysis revealed two WGD events, during which the chromosome number increased from \(\:\text{n}=9\) to \(\:\text{n}=18\), followed by aneuploid reduction, leading to the diverse chromosomal numbers observed within this tribe. The WGD retained genes are enriched in Environmental information processing pathways, potentially enhancing resistance to biotic and abiotic stresses. These genes also show a preference for multiple exons, which may promote alternative splicing and functional diversification. Additionally, integrating genomic data from 24 Myrtales species, a robust phylogenetic framework based on 994 single-copy/low-copy orthogroups were reconstructed. Our results supported Combretaceae as the sister group to Myrtaceae and Melastomataceae, providing new insights into the evolutionary relationships within the Myrtales.
Background
The order Myrtales is the third largest lineage within the Superrosidae clade [1]. According to the APG IV [2], Myrtales comprises nine families with a total of 13,005 species. The species distribution across these families is highly uneven, with the families Onagraceae, Myrtaceae, Lythraceae, Melastomataceae, and Combretaceae being the most species-rich. This order exhibits rich species diversity and a wide range of life forms, including herbs, shrubs, trees, and mangroves [1]. A considerable body of research has focused on the medicinal [3], economic [4], and horticultural [5] aspects of Myrtales species. However, to date, a comprehensive and well-resolved phylogenetic framework for the Myrtales is still lacking. The phylogenetic relationships among the families of the Myrtales are still unresolved, with differing conclusions arising from variations in methodological approaches and data analysis techniques across research.
A well-resolved phylogenetic framework is crucial for investigating the species formation and diversification within the Myrtales, which is a fundamental ecological question. The sister group relationships between the families Myrtaceae-Melastomataceae and Lythraceae-Onagraceae are relatively well-established, while the placement of the Combretaceae remains a subject of ongoing debate [6,7,8,9,10]. Three alternative hypotheses exist: (1) Combretaceae is the sister group to the remaining Myrtales families [6, 7, 10]; (2) Combretaceae is sister to the Lythraceae-Onagraceae clade, and also sister to the other families of Myrtales [8, 9]. (3) Combretaceae clusters with the Myrtaceae-Melastomataceae clade, while the Lythraceae and Onagraceae form another clade, sister to the other Myrtales families. Since Conti et al. [9] first analyzed the phylogenetic relationships of Myrtales based on the rbcL gene, subsequent studies have predominantly relied on plastid regions [6, 7] and specific exons [10], which suffer from limited resolution, gene flow interference, and incomplete coverage. With the advent of the genomic era, conserved single-copy/low-copy nuclear genes identifiable among different species have provided new methods for resolving phylogenetic relationships among distant taxa [5, 11,12,13,14]. Currently, whole genome data are available for species from five families within Myrtales, which provides the foundation for identifying single-copy/low-copy genes at the genomic level for phylogenetic research. However, phylogenetic studies still lack sufficient coverage, particularly for less studied families such as the Onagraceae.
The family Onagraceae, with species primarily distributed across two tribes, Onagreae (13 genera/260 species) and Epilobieae (2/173), shows considerable diversity [2]. The Epilobieae is represented by Chamerion (= Chamaenerion) and Epilobium, with the latter comprising 165 species. This tribe has a highly variable chromosome number (\(\:\text{n}=9,\:10,\:12,\:13,\:15,\:16,\:\text{18,19},\:\text{30,36},\:54\)), and polyploidy (or whole genome duplication, WGD) is likely associated with the diverse chromosomal counts and its evolutionary history [15, 16]. Polyploidy serves as a significant driving force in eukaryotic evolution, particularly in angiosperms [17], and is related to significant evolutionary transitions and the adaptive radiation of species [18]. At the chromosome level, polyploidy is often accompanied by extensive chromosomal rearrangements such as fusion, splitting, inversion, translocation, and deletion. These alterations are significant contributing factors to chromosomal speciation [19, 20]. At the genetic level, polyploidy confers an additional copy to all genes by doubling the entire genome, thereby increasing the reservoir of genetic material for evolutionary processes to act upon [21, 22]. Therefore, investigations into chromosomal and duplicated genes evolution post-polyploidy will deepen understanding of the evolutionary history of the lineages characterized by WGD.
The chromosome evolution and polyploidy of the Epilobieae remain unclear. It has been proposed that the variable chromosome numbers in the Epilobieae arise from the process of aneuploid reduction from \(\:\text{n}=9/10\) to \(\:\text{n}=6\), followed by WGD events that further generated a variety of chromosome numbers, including \(\:\text{n}=12\:(6+6)\), \(\:\text{n}=13\:(7+6)\), \(\:\text{n}=15\:(9+6)\), and \(\:\text{n}=16\:(8+8)\) [23, 24]. Other researchers have suggested that the ancestors of the Epilobieae were ancient polyploids, with Chamerion (\(\:\text{n}=18,\:36,\:54\)) being the sister group to section Epilobium of Epilobium, and section Epilobium being the sister group to the other sections of Epilobium. Furthermore, \(\:\text{x}=18\) is considered the base chromosome number for the species in this clade [15]. Regarding polyploidization events in this group, only the OEGRα (26 Ma) in the common ancestor of Epilobium and Oenothera have been identified based on transcriptomic data [16]. Therefore, more genomic data is required to fully elucidate the evolutionary and diversification history of this clade.
Here, we conducted whole-genome sequencing for both C. angustifolium (\(\:2\text{n}=2\text{x}=36\), Supplementary Fig. 1) and E. hirsutum (\(\:2\text{n}=2\text{x}=36\), Supplementary Fig. 2), which are the representative species of Chamerion and Epilobium [2]. Following the acquisition of chromosome-level reference genomes for each of these two species, we aimed to analyze polyploidy events and chromosome evolution history of the Epilobieae lineage to assess the impact of polyploidy on the adaptive evolution and diversification within this tribe. Meanwhile, combined with published genomic data from the five major families of the Myrtales, our study offers new and comprehensive insights into the contentious phylogenetic relationships within this order.
Results
Ploidy and genome size assessment
Since C. angustifolium contains autopolyploids, we confirmed that the material used in this study was diploid through karyotype analysis (Supplementary Fig. 3). Additionally, we performed a preliminary estimation of the genome sizes of C. angustifolium and E. hirsutum using flow cytometry and low-coverage genome sequencing. Flow cytometry results indicated that the genome sizes of C. angustifolium and E. hirsutum are 570.38 Mb and 364.52 Mb, respectively (Supplementary Fig. 4). Low-coverage Illumina sequencing generated 27 Gb (40.97-fold coverage) and 20 Gb (46.84-fold coverage) of data for C. angustifolium and E. hirsutum, respectively. Using K-mer analysis, we further estimated the genome sizes and heterozygosity of these species, yielding genome sizes of 659.72 Mb (heterozygosity 1.52%) for C. angustifolium and 427.94 Mb (heterozygosity 0.23%) for E. hirsutum (Supplementary Fig. 5).
Based on the results of flow cytometry and K-mer analysis, we determined the required genome coverage for subsequent whole genome sequencing. For the C. angustifolium and E. hirsutum genome sequencing, PacBio sequencing generated 32 Gb (48.56-fold coverage) and 35 Gb (81.97-fold coverage), and Hi-C sequencing generated 80 Gb (126.38-fold coverage) and 40 Gb (101.03-fold coverage), respectively.
Genome assembly and evaluation
With PacBio sequencing, the assembly size of C. angustifolium was 636.59 Mb and E. hirsutum was 400.23 Mb. We then used Hi-C data to assemble the two Epilobieae genomes to the chromosomal level. Ultimately, 573.15 Mb of the C. angustifolium assembly and 360.99 Mb of the E. hirsutum assembly were distributed across 18 chromosome-level pseudomolecules in their respective genomes (Table 1, Supplementary Tables 1 and 2, and Supplementary Fig. 6).
We employed Benchmarking Universal Single-Copy Orthologs (BUSCO) [25] to evaluate the completeness of these genomes, which revealed that the C. angustifolium and E. hirsutum genomes were 98.5% and 98.1% complete, respectively (Supplementary Table 3). The short reads from Illumina sequencing were aligned to the assembled genomes of C. angustifolium and E. hirsutum for genome polishing, achieving mapping rates of 98.70% and 99.59%, and coverage rates of 99.90% and 99.88%, respectively. The core eukaryotic genes mapping approach assessment (CEGMA) [26] indicated that 237 and 242 core eukaryotic genes (CEGs) were assembled in the genomes of C. angustifolium and E. hirsutum, respectively. These findings reveal the genomes are of high quality. The overall characteristics of both genomes are presented in Fig. 1.
[IMAGE OMITTED: SEE PDF]
[IMAGE OMITTED: SEE PDF]
Genome annotation
Annotation of the repetitive sequences, including tandem repeats and transposable elements, in both Epilobieae genomes revealed that they constitute 66.28% and 69.69% of the C. angustifolium and E. hirsutum genomes, respectively. Long terminal repeat retrotransposons (LTR-RTs) were found to be the most prevalent repetitive sequences, making up 55.63% of the C. angustifolium genome and 55.26% of the E. hirsutum genome. Both species have few short interspersed nuclear elements (SINE) (Supplementary Table 4). The E. hirsutum and C. angustifolium genomes were predicted to contain 27,135 and 33,182 protein-coding genes (CDSs), with 98.68% of the CDSs were functionally annotated within the two Epilobieae genomes (Supplementary Tables 5 and 6). Additionally, non-coding RNA statistics are shown in Supplementary Table 7.
Phylogenetic analysis of Myrtales
We identified 1256 single-copy/low-copy orthogroups from 24 species within the Myrtales, as well as two outgroup species, Vitis vinifera and Oryza sativa (Appendix 1). These orthogroups were organized into four datasets based on species and clade coverage. The coalescent trees constructed using these four datasets demonstrated strong robustness and consistency in the phylogenetic relationships of the Myrtales, with all branches receiving high support (\(\:\text{L}\text{P}\text{P}>0.7\), where LPP values between 0.7 and 1.0 are considered indicative of strong support [10]) (Supplementary Figs. 7–10). The results showed that the Onagraceae and the Lythraceae form a sister group, as do the Melastomataceae and the Myrtaceae. Additionally, the Combretaceae is sister to the clade of Melastomataceae-Myrtaceae (Fig. 2).
[IMAGE OMITTED: SEE PDF]
Comparative genomics of the Epilobieae
Gene family cluster analysis among the 16 species revealed the presence of 29,858 gene families (orthogroups) containing 485,814 genes across all the species (Supplementary Table 8). We calculated the number of genes in each orthogroup, showing that the C. angustifolium and E. hirsutum genomes contain fewer single-copy genes compared to the other species (Fig. 3B).
[IMAGE OMITTED: SEE PDF]
Gene family dynamics analysis revealed that the C. angustifolium genome contains 1,906 expanded gene families and 1,009 contracted gene families, whereas the E. hirsutum genome has 618 expanded gene families and 2,223 contracted gene families (Fig. 3A). Furthermore, Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was conducted in the significantly varied gene families (\(\:p\hspace{0.17em}<\hspace{0.17em}0.05\)) of both species. In the C. angustifolium genome, 1,070 genes of the expanded gene families were enriched in the pathways of “Oxidative phosphorylation” and “Selenocompound metabolism”, while 127 genes of the contracted gene families were enriched in the pathways of “Zeatin biosynthesis” and “Glutathione metabolism” (Supplementary Figs. 11 A and 12 A, and Supplementary Data1). In the E. hirsutum genome, 316 genes of the expanded gene families were enriched in the pathways of “Oxidative phosphorylation” and “Ribosome”, while 387 genes of the contracted gene families were enriched in the pathways of “Plant-pathogen interaction” and “Endocytosis” (Supplementary Figs. 11B and 12B, and Supplementary Data1).
We then inferred the divergence times of the 16 species. The common ancestor of the two Epilobieae species diverged 20.9 Mya [with a 95% confidence interval (CI) of 9.4–36.8 Mya], with an additional divergence from the Clarkia xantiana (tribe Onagreae) 30.8 Mya (95% CI of 14.5–50.4 Mya). And the crown age of Myrtales is 106.9 Mya (95% CI of 100.1-112.8 Mya) (Fig. 3A).
Chromosome evolution following the polyploidization
The distribution of synonymous substitutions per synonymous site (\(\:{K}_{\text{s}}\)) exhibited two peaks in the genomes of C. angustifolium (\(\:{K}_{\text{s}}\) ≈ 0.36, \(\:{K}_{\text{s}}\) ≈ 1.04), E. hirsutum (\(\:{K}_{\text{s}}\) ≈ 0.42, \(\:{K}_{\text{s}}\) ≈ 1.12), and Cl. xantiana (\(\:{K}_{\text{s}}\) ≈ 0.21, \(\:{K}_{\text{s}}\) ≈ 1.00), indicating that two WGD events occurred in their evolutionary history. The \(\:{K}_{\text{s}}\) distribution in the S. alba genome showed one peak (\(\:{K}_{\text{s}}\) ≈ 0.59) which corresponded to the WGT event of this species [27] (Fig. 4A). Based on the \(\:{K}_{\text{s}}\) differentiation peaks observed between each pair of these four species, the two Epilobieae species and Cl. xantiana shared a WGD event (WGDA) followed by divergence into two clades. Subsequently, the two Epilobieae species experienced another WGD event (WGDE) before their divergence, while Cl. xantiana underwent a lineage-specific WGD event (WGDC) (Fig. 4B). Furthermore, we estimated that the WGDA event occurred 84.9 Mya, with the WGDE event 30.6 Mya (Fig. 3A).
[IMAGE OMITTED: SEE PDF]
To investigate the chromosomal evolution history of the Epilobieae, we first inferred the ancestral chromosome numbers across the phylogenetic tree using ChromEvol [28]. The results indicated that the chromosome number of the most recent common ancestor (MRCA) among the two Epilobieae species and Cl. xantiana is \(\:\text{n}=9\), while the MRCA of the two Epilobieae species had a chromosome number of \(\:\text{n}=18\) (Supplementary Fig. 13). The \(\:{K}_{\text{s}}\) values of the conserved adjacent collinear genes and blocks of C. angustifolium and E. hirsutum are close to the \(\:{K}_{\text{s}}\) peak values of WGDE (\(\:{K}_{\text{s}}\) ≈ 0.36, \(\:{K}_{\text{s}}\) ≈ 0.42), indicating that these genes/blocks were retained after WGDE (Supplementary Fig. 14). The genomes of C. angustifolium and E. hirsutum have accumulated large-scale chromosomal rearrangements. Two chromosomes of each species together form a collinear block unit, such as Chromosome 4 and 9 of C. angustifolium and Chromosome 1 and 12 of the E. hirsutum genomes (Fig. 4C).
Gene retained after WGD
We used a Poisson generalized linear mixed-effects model (GLMM) to analyze the factors contributing to variation in duplicated genes retention after WGD. We extracted 5,617 and 6,154 duplicated genes retained after WGDA and WGDE from 33,182 genes of C. angustifolium genome. Whilst for the E. hirsutum genome, 5,039 and 5,385 duplicated genes from 27,135 genes were extracted. Additional characteristics of all the genes of these two species including Species, KEGGtype, Length, GC, nExon and Chrom were tallied (Supplementary Data2). Since the variables GC and Species showed highly multi-collinearity, we excluded the variable GC and then recalculated the GLMM analysis (Supplementary Table 9). The model results are shown in Supplementary Tables 10, and Table 2. All factors except for Length significantly affected the duplicated genes retention after the two WGD events. Genes with functions related to Environmental Information Processing, Organismal Systems and Metabolism, and with more exons, especially in SpeciesA (C. angustifolium), tend to be retained after WGD.
[IMAGE OMITTED: SEE PDF]
To further explore the functional preferences of WGD retained genes, we conducted KEGG enrichment analysis of WGDA and WGDE retained genes in both C. angustifolium and E. hirsutum (Supplementary Data3). The WGDA retained genes were significantly enriched in the Environmental information processing related pathways such as “Plant hormone signal transduction” and “Phosphatidylinositol signaling system”; as well as the Metabolism pathways such as “Flavonoid biosynthesis” and “Flavone and flavonol biosynthesis”. Meanwhile, the WGDE retained genes were significantly enriched in the Environmental information processing pathways including “Plant hormone signal transduction” and “Phosphatidylinositol signaling system”; as well as the Metabolism pathways including “Inositol phosphate metabolism” and “Alanine, aspartate and glutamate metabolism” (Supplementary Figs. 15 and 16).
Discussions
Given that the genome of C. angustifolium has already been published on public platforms (https://www.ncbi.nlm.nih.gov/nuccore/CAMPFE000000000.1), it is necessary to explain the significance of resequencing this species. Our study focuses on the polyploidy and chromosomal evolution history of the species, and using wild individuals provides a more comprehensive reflection of its evolutionary process in the natural environment. In contrast, the published genome was derived from a specimen collected at the Royal Botanic Gardens, Kew, which may be a cultivated individual and therefore does not fully represent the genetic diversity found in the natural environment. Although the high heterozygosity of the wild individuals resulted in slightly lower continuity and completeness in our genome assembly, the relatively high BUSCO score (Supplementary Table 3) indicates that our genome successfully retains most of the key core genes, demonstrating the effectiveness of the assembly.
Phylogenetic relationships among five families within the Myrtales
Single-copy/low-copy nuclear genes are invaluable in phylogenetic studies due to their straightforward evolutionary patterns, conservation across species, and ability to provide high-resolution phylogenetic signals [29, 30]. These genes have been widely utilized in plant phylogenetic research at various taxonomic levels, including major angiosperm clades [12], as well as at the family [11, 31, 32] and genus [33] levels. We herein integrated the genomic data of two Onagraceae species sequenced in our research, along with data of 22 previously published species of the Myrtales, to identify the conserved low-copy genes. For the Combretaceae, a family with a particularly controversial phylogenetic position, we included genomic data from five published species, representing the most diverse subfamily of the Combreteae, which includes the tribes Laguncularieae and Combreteae [2]. For the Myrtaceae, we selected representative species from 4 tribes.
Although our sampling size is relatively limited, low-copy genes have been shown to be effective in resolving phylogenetic relationships among distant related species [13]. The Myrtales is a highly significant clade with considerable medicinal, economic, and horticultural value [3,4,5]. Previous phylogenetic studies of the Myrtales have primarily relied on molecular markers such as chloroplast genes (e.g., rbcL, ndhF, matK) [6, 9], mitochondrial genes (e.g., matR), nuclear genes (e.g., 16 S rRNA and 18 S rRNA) [7], chloroplast genomes [8], and Angiosperms353 exons [10]. Research has shown that WGD events are frequent within the families of Myrtales. For example, the genome of Melastoma dodecandrum has undergone at least three WGD events [5], while S. alba has experienced a WGT event [27], leading to a high proportion of repetitive sequences in their genomes. Compared to the nuclear genes, chloroplast genes exhibit relatively low variation, and exons are still insufficient for resolving the phylogenetic tree of Myrtales. In this study, we identified 994 single-copy/low-copy orthogroups across 24 species of the Myrtales and two outgroup species, and reconstructed the phylogenetic relationships of the five major Myrtales families using the coalescent-based method. The coalescent-based method infer species trees by simulating the branching patterns of multiple genes across different lineages, and are particularly effective in studying phylogenetic relationships among distantly related species [34, 35]. For example, significant progress has been made in the phylogenomic frameworks for green plants [36], the five major angiosperm lineages [37], and rosids [38]. Our analysis produced robust and consistent phylogenetic relationships for the five Myrtales families and yielded strong support for the clade where Combretaceae is sister to Myrtaceae and Melastomataceae (Fig. 2, and Supplementary Figs. 7–10). This extensive nuclear gene-based evolutionary tree offers a refined resolution of the complex phylogenetic relationships within the Myrtales.
Gene family evolution and chromosome evolution of the Epilobieae
We estimated the divergence time of the crown node of the Myrtales at 106.9 Mya (95% CI: 100.1–112.8 Mya) (Fig. 3A), which is in close agreement with previous studies, including those by Berger et al. [7] (~ 116 Mya), Magallón et al. [39] (~ 117Mya), and Zhang et al. [8] (~ 105 Mya). Additionally, our divergence time estimates for the two Epilobieae species align closely with the previous estimate of 14.1 Mya (95% CI: 6.4–30.5 Mya) [40], further validating the robustness of our timing analysis.
The results of WGD events analysis reveals that E. hirsutum, C. angustifolium, and Clarkia xantiana share a WGDA event (~ 85 Ma). After diverging into the Epilobieae and Clarkia lineages, each lineage underwent an additional WGD event (Fig. 4A). Previous studies based on the 1000 Plants (1KP) transcriptome data identified a WGD event, OEGRα, in Oenothera grandiflora, and placed it at the MRCA of Epilobium and Oenothera (26 Ma) [16]. Oenothera and Clarkia belong to the tribe Onagreae, with Epilobium and Chamerion part of the Epilobieae. After diverging from Cl. xantiana, these two Epilobieae species experienced a lineage-specific WGDE. Therefore, we suggest that OEGRα is likely correspond to the WGDC event of the Cl. xantiana, rather than being shared with Epilobium. This discrepancy may stem from the limitations of transcriptome data, which include only expressed genes. Changes in gene copy numbers and repetitive sequences present in the genome may not be fully captured in transcriptome data, potentially introducing bias in the inference of WGD events.
Based on WGD events and chromosome number evolution of C. angustifolium and E. hirsutum, we inferred that the chromosome number in the MRCA of C. angustifolium and E. hirsutum increased from \(\:\text{n}=9\) to \(\:\text{n}=18\) following WGDE event. For other species in the Epilobieae, we hypothesize that if WGDE was shared across the entire clade, their chromosome numbers could have resulted from aneuploid reduction following this WGD event. This process might explain the derivation of chromosome numbers such as \(\:\text{n}=\text{9,10,12,13,15,16}\) from \(\:\text{x}=18\). Meanwhile, the chromosomal numbers greater than \(\:\text{n}\:=\:18\) in the Epilobieae likely originated from species-specific polyploidization or hybridization events. For example, C. angustifolium underwent autopolyploidization, resulting in tetraploids (\(\:2\text{n}=4\text{x}=72\)) and hexaploids (\(\:2\text{n}=6\text{x}=108\)) [41]. Similarly, Epilobium subdentatum arose from a hybridization event between Epilobium torreyi (\(\:\text{n}=9\)) and Epilobium densiflora (\(\:\text{n}=10\)) or their ancestors [42]. Our findings supported Baum et al. [15] concerning the chromosomal evolution and ancient polyploidy in the Epilobieae. Due to large-scale genomic rearrangements, we are unable to infer the chromosomal number variation prior to the MRCA of the two Epilobieae species and Cl. xantiana, especially the chromosomal number variation caused by WGDA.
It is well known that polyploidy is accompanied by chromosomal rearrangements that inhibit gene flow and promote chromosomal speciation [43]. Previous studies suggest that speciation events involving polyploidy and changes in chromosome number account for approximately 7% in ferns and 2–4% in angiosperms speciation, respectively [44]. The tribe Epilobieae, which is one of the most species-rich clades in the Onagraceae, frequently exhibits hybridization between closely distributed species, with differences in reciprocal translocations among different species [45,46,47]. We speculate that chromosomal rearrangements following two rounds of WGD events in the Epilobieae may have contributed to the diverse chromosome numbers observed, which may also have subsequently promoted diversification within this clade. Further analysis of the earliest diverging taxon of the Onagraceae will provide valuable insights into the chromosomal evolutionary history preceding WGDA, which is crucial for reconstructing the evolutionary history of the entire family.
WGD retained genes related to resistance enhancement
At the genetic level, the structural and functional preferences of WGD retained genes in the two Epilobieae species may promote adaptation to different environment. Duplicated genes with more exons are more likely to be retained after polyploidy (Table 2). Exons with alternative splicing sites frequently undergo alternative splicing during transcription, resulting in different transcripts. The number of alternative splicing (AS) events is linearly positively correlated with the number of gene exons [48]. AS is an important factor in the increasing cellular and functional complexity of higher organisms [49, 50]. In plant kingdom, alternative splicing plays an important role in both growth and development processes, such as inducing flowering [51] and responding to abiotic stress [52]. Hence, the selective retention of multi-exon genes following the WGD events in the Epilobieae may be related to the enhancement of stress resistance.
Both WGDs retained genes were significantly enriched in “Plant hormone signal transduction” and “Phosphoinositide signaling system” (Supplementary Figs. 15 and 16). The former pathway is highly conserved evolutionarily and plays a critical role in plant growth and development [53]. The latter is one of the core pathways in cellular signal transduction, crucial for growth regulation, hormone signaling, and environmental stress responses, particularly during the early stages of drought stress [54]. Additionally, WGDA retained genes were significantly enriched in pathways related to flavonoid biosynthesis (Supplementary Fig. 15), which are crucial phytochemicals in plants [55]. These secondary metabolites are well-characterized in many species of the Onagraceae and have been used to study the evolutionary relationships among different taxa [56]. Flavonoid compounds serve as UV protectants, pollinator attractants, and antimicrobial compounds, enhancing plant resistance to biotic (e.g., pathogenic microorganisms and parasitic plants) and abiotic stresses (e.g., UV-B radiation, cold temperatures, salt, and drought stress) [57]. WGDE retained genes were enriched in the “Inositol phosphate metabolism” pathway (Supplementary Fig. 15). Inositol phosphates are vital signaling molecules in eukaryotes, playing an essential role in cellular signal transduction [58, 59]. These WGD retained genes related to secondary metabolite, biosynthesis transcriptional regulation and signal transduction may have, to some extent, facilitated the adaptation of C. angustifolium and E. hirsutum to diverse environments [15].
Conclusions
The phylogenies based on single-copy/low-copy nuclear genes of 24 representative species from five families of the Myrtales and two outgroup species showed that the Onagraceae and the Lythraceae form a sister group, as do the Melastomataceae and the Myrtaceae. The Combretaceae is shown to form a sister group with a clade of Myrtaceae-Melastomataceae. This robust and consistent phylogeny framework provides new insights into the phylogeny of the Myrtales. Additionally, we proposed that the chromosome number of the Epilobieae increased from an ancestral base of \(\:\text{n}=9\) to \(\:\text{n}=18\) following WGDE event. And the diverse chromosomal numbers observed within this tribe resulted from aneuploidy reduction from the basic chromosomal number of \(\:\text{x}=18\). Furthermore, WGD retained genes are functionally enriched in the Environmental information processing pathways and show a preference for multi-exons. Our findings suggest that polyploidy played an important role in the evolution and diversification of the Epilobieae.
Materials and methods
Plant material collecting
All plant materials of C. angustifolium and E. hirsutum used in this study were collected from the wild in Yingxiong Bridge, Urumqi City, Xinjiang Uygur Autonomous Region (43°22′3″ N, 87°12′12.6″ E) and Taibai County, Baoji City, Shaanxi Province (34°2′15.3″ N, 107°37′23.2″ E), respectively. All samples were stored at −80 °C prior to sequencing. No specific licenses or permissions from local government were required for our collection. The voucher specimens (CA-20210831-001, EH-20210902-001) were identified by professor Kang Huang from College of Life Sciences, Northwest University and deposited in the Biological Specimen Collection, Northwest University.
Genome ploidy evaluation
Given the occurrence of polyploidy in C. angustifolium, karyotype analysis was performed using anatomical sections of root tip tissue to determine its ploidy level. Root tips about 1–2 cm in length were cut and treated with 0.2 µmol/L amiprophos-methyl solution for 4 h. After rinsing three times with tap water, the root tips were cut off and placed in 0.5 mL wet centrifuge tubes, then subjected to N2O treatment at 0.8–1.2 MPa for 1 h. The materials were then transferred to 90% acetic acid to fix them on ice for 10 min and subsequently used for chromosome preparation [60]. The results are shown in Supplementary Fig. 3.
Flow cytometry analysis
Young leaf tissues of C. angustifolium and E. hirsutum were used for flow cytometry analysis. Samples were placed in 500 µL Nuclei Extraction buffer, chopped with sharp blade, and then filtered through a 50 μm filter after 60 s. Following this, 2000 µL of staining buffer with RNase was added, and the samples were incubated for 30 min in the dark. The nuclei suspension was analyzed using a CyFlow Space Flow Cytometer (Sysmex Partec, Muenster, Germany) and the corresponding FloMax software.
Genome sequencing and assembly
Materials for both C. angustifolium and E. hirsutum were sent to Novogene (Beijing, China) for DNA extraction from leaf tissues using hexadecyltrimethylammonium bromide method and broken into random fragments. Illumina and PacBio libraries were constructed, and sequencing was performed on the Illumina HiSeq and PacBio Sequel II platforms, respectively. For genome annotation, RNA was extracted from various tissues, including roots, stems, leaves, flowers, and seeds, of two species. Sequencing libraries were then constructed and sequenced was performed on the Illumina Nova Seq 6000 platform.
After quality control of the raw Illumina data, K-mer distribution analysis was performed using SOAPdenovo r242 [61, 62] to preliminarily estimate the genome size and heterozygosity. And the Illumina clean data were later applied for genome assemblies polishing. Subsequently, high-quality long reads from PacBio sequencing which have quality values above Q20 were used to genome assembly by using hifiasm v0.15.2 [63].
Hi-C sequencing
We selected young leaves to extract chromatin from the nucleus and used it to prepare Hi-C libraries. The library construction process primarily involves crosslinking plant cells with formaldehyde, followed by restriction enzyme digestion, labeling, DNA fragment ligation, reversal of crosslinking, and DNA extraction [64]. Using the clean Hi-C data, the contigs were phased and scaffolded by ALLHiC v0.9.13 [65] after polishing. We then conducted manual correction based on the interaction strength among chromosomes by Juicebox v1.3.0 [66], and the genome at the chromosome level was obtained. Finally, we used BUSCO v4 [25] to assess the genome integrity.
Assessment of genome assemblies
For genome assembly polishing, we used Burrows-Wheeler Aligner (BWA) v0.7.8 [67] to map the short reads generated from Illumina sequencing to the genome assembly. We then assessed the mapping rates and coverage to evaluate the quality of the assembly. BUSCO and CEGMA [26] analyses were performed for further assessment of sequence completeness.
Genome annotation
We combined de novo and homology-based approaches to conduct repeat sequences annotation using software LTR_Finder v1.0.6 [68], RepeatModeler v2.0.1 [69], RepeatScout v1.0.5 [70], RepeatMasker v4.1.0 [71] and RepeatProteinMask. We used Tandem repeats finder v4.09 [72] to detect the tandem repeats, and LTR_Finder [68] to investigate LTR elements.
Gene structure was predicted with combined ab initio, homolog and transcriptomic based strategies. AUGUSTUS v3.2.3 [73], GlimmerHMM v3.0.4 [74], GeneID v1.4 [75], SNAP v2013.11.29 [76] and GENSCAN v1.0 [77] were used for ab initio prediction, and BLAST v2.2.26 [78] and GeneWise v2.4.1 [79] for homology-based prediction. Furthermore, combined with transcriptome data, the data sets obtained above were integrated into a non-redundant gene set by EVidenceModeler v1.1.1 [80] and finally corrected by PASA v2.4.1 [81]. Non-coding RNAs were predicted by using tRNAscan-SE v1.4 [82] (tRNAs), BLAST (rRNAs), and Infernal v3.0 [83] (microRNAs and small nuclear RNAs).
Gene sets were search against the protein database including KEGG (http://www.genome.jp/kegg/), Pfam (https://pfam.xfam.org/), GO (geneontology.org), InterPro (https://www.ebi.ac.uk/interpro/), SwissProt (http://www.UniProt.org), and NR using BLAST to inferred the functions.
Phylogenetic analysis of Myrtales
We conducted orthogroup reference in Orthofinder v2.5.4 [84] based on protein sequences from three species of Onagraceae, three species of Melastomataceae, five species of Lythraceae, five species of Combretaceae, eight species of Myrtaceae, and two outgroup species—V. vinifera and O. sativa. The download links for the genome data of these species can be found in Appendix 1 (accessed on 20 November 2024). The phylogenetic hierarchical orthogroups (HOGs) shared by the 26 species were used for subsequent analysis. We then retained genes that are present in at least 21 species (80% of the total species) and have a maximum of two copies in the same HOGs and in any single species. To reduce the phylogenetic noise that may be caused by genes from multi-copy gene families, we performed a BLASTP search using the protein sequences of Arabidopsis thaliana (a model species with high-quality genome data) against the genes obtained in the previous step and discarded those genes with ≥ five hits. We retained a total of 1,256 single-copy/low-copy orthogroups with sequence length greater than 600 bp. Based on the species and clade coverage, we then created four datasets: 994 HOGs (\(\:>50\%\) species coverage), 939 HOGs (\(\:>70\%\) species coverage), 706 HOGs (\(\:>80\%\) species coverage and \(\:>90\%\) clade coverage), and 339 HOGs (\(\:>90\%\) species coverage and \(\:>90\%\) clade coverage).
Multiple sequence alignment of nucleotide sequences for each single-copy/low-copy orthogroups were carried out by using MAFFT v.7.487 [85]. To ensure data quality, the alignments were subsequently trimmed using trimAl [86] with the - automated parameter. The orthogroup phylogenetic trees were reconstructed using RAxML-NG with the GTR + G model and 1000 bootstrap replicates [87], and coalescent trees were inferred using ASTER v1.16.3.4 with default settings [88].
Comparative genomics of the Epilobieae
The robust phylogenetic framework of the order Myrtales provides a solid foundation for exploring gene family evolution in two species of the tribe Epilobieae. To improve the accuracy of gene family clustering analysis, we reduced the number of species per family based on the phylogenetic analysis of the Myrtales. Specifically, we selected the following species for gene family clustering: Eucalyptus grandis, Syzygium grande, and Corymbia citriodora from the Myrtaceae; Melastoma candidum and M. dodecandrum from the Melastomataceae; Conocarpus erectus, Laguncularia racemosa, and Lumnitzera racemosa from the Combretaceae; Pemphis acidula, Sonneratia alba, and Lagerstroemia speciosa from the Lythraceae; C. angustifolium, E. hirsutum, and Cl. xantiana from the Onagraceae; and V. vinifera and O. sativa as outgroups (Appendix 1).
A supermatrix dataset is required for molecular clock analyses. We constructed a concatenated tree using the protein sequences of 94 single-copy orthologous genes in RAxML v8.2.12 [89] with the PROTGAMMAJTT model and 1000 bootstrap replicates. The summary phylogenetic tree was used as a constraint tree for the analysis of divergence time estimation. MCMCtree module in the PAML v4.7 [90] was used to estimate divergence times among these 16 species with the HKY85 nucleic acid substitution model, and the correlated rates molecular clock model. To enhance the accuracy of the divergence time estimation, four reliable fossil calibrations were incorporated. Firstly, the root node of eudicots was placed at 119.6-128.63 Mya [91]. Secondly, based on Sonneratia-like pollen, the common ancestor of Sonneratia and Lagerstroemia was set to a time no earlier than 55.8 Mya [92]. Finally, we used fossil calibrations from the TimeTree [40] for Vitis - Oryza (142.1-163.5 Mya) and Eucalyptus - Melastoma (70.8–94.5 Mya). All MCMC process were independently run twice to ensure convergence, with 10,000,000 generations and sampling every 500 generations after a burn-in of 1,000,000 iterations.
Gene family expansion and contraction analyses were performed using CAFE 5 [93]. The GAMMA model was employed to ensure model convergence, with the -k parameter was set to range from 2 to 5. The final results were obtained with \(\:\text{k}=2\).
Chromosome evolution of the Epilobieae
To ascertain the whole genome duplication events in the evolutionary history of the Epilobieae, we constructed \(\:{K}_{\text{s}}\) distribution among the two Epilobieae species and Cl. xantiana from the tribe Onagreae using WGDI v0.6.5 [94].
In the absence of reliable fossil records of the Chamerion and Epilobium, the 95% confidence interval for the divergence time estimation between C. angustifolium and E. hirsutum is a little broad. Therefore, we used the median divergence time between these two species as a reference point to estimate the ages of WGDs. Specifically, based on the \(\:{K}_{\text{s}}\) distribution of C. angustifolium, E. hirsutum, and Cl. xantiana, we calculated the \(\:{K}_{\text{s}}\) distances between each pair of species and used the formula \(\:T={K}_{s}/(2\times\:r)\) to estimate the molecular evolutionary rate (r, average \(\:{K}_{\text{s}}\)/year rate) of C. angustifolium and E. hirsutum. The r values were taken as 9.85e-9 and 1.33e-8 for the genomes of C. angustifolium and E. hirsutum, respectively. Since the two WGD events occurred prior to the divergence of the MRCA of these two Epilobieae species, we averaged the WGD times obtained for the two species to determine the final timing of the WGD events.
To investigate the chromosomal evolution history of the Epilobieae, we employed ChromEvol v2.1 [28] to infer ancestral chromosome numbers across the phylogenetic tree. The analysis included two species from the tribe Epilobieae and Cl. xantiana, with Punica granatum and S. alba from the family Lythraceae serving as outgroup species, given the close relationship between Lythraceae and Onagraceae. The haploid chromosome numbers of these species were obtained from the Chromosome Counts Database (CCDB) [95]. We used WGDI to identify adjacent conserved collinear genes and blocks among all chromosome pairs within the two Epilobieae species, and then inferred the chromosome number of the MRCA of C. angustifolium and E. hirsutum. We identified gene pairs in the collinear regions by analyzing the protein sequences between C. angustifolium and E. hirsutum using JCVI v0.9.14 [96], which also generated the visualizations of the results.
WGD retained duplicates identification and GLMM analysis
Based on the results of gene family clustering analysis of the 16 species, we identified the duplicate genes retained after the two WGD events in C. angustifolium and E. hirsutum, using Eu. grandis as the outgroup, as it does not share the two WGD events of the Epilobieae lineage. Gene families containing four genes in the Epilobieae genomes and at least one gene from Eu. grandis were retained. Gene duplication events with support greater than 0.5 were extracted. Genes common to both screening results were identified as candidates for those retained after the WGD events, with tandem repeats subsequently removed. To distinguish the duplicated genes retained after different WGD events, we calculated the \(\:{K}_{\text{s}}\) values of all duplicated gene pairs. The average \(\:{K}_{\text{s}}\) peak values associated to the two polyploidy events in the Epilobieae genomes were used as filter conditions. Duplicated genes with \(\:{K}_{\text{s}}\) values lower than the average were identified as those retained after the WGDE event, while those with higher values were considered as being retained after the WGDA event.
After identifying the two WGD retained duplicated genes in both species, we used a GLMM implemented via the lme4 package in R [97] to analyze the factors explaining variance in the retention of genes after the WGD events. The explanatory variables were: Species (categorical, indicating the species to which the gene belongs), KEGGtype (discrete, representing the first-level classification of KEGG), Length (continuous, representing gene length), GC (continuous, representing GC content), nExon (continuous, representing number of exons) and Chrom (categorial, representing the chromosome and treated as a random factor) (Supplementary Data3). Interaction effects between Species and nExon, as well as between Species and WGD, were also considered.
Data availability
The data that support the findings of this study have been deposited into National Genomics Data Center (NGDC) with BioProject ID PRJCA029951. The raw genomic Illumina reads, PacBio reads, Hi-C reads, and RNA-seq reads have been deposited in the Genome Sequence Archive (GSA, https://ngdc.cncb.ac.cn/gsa) under accession number CRA019714. The genome assemblies have been deposited in the Genome Warehouse (GWH, https://ngdc.cncb.ac.cn/gwh) under accession number GWHFFOT00000000.1 (C. angustifolium) and GWHFFOU00000000.1 (E. hirsutum). The genome annotations and source data are available at Figshare https://figshare.com/s/5b9a4e6aa37cc73b8675.
Abbreviations
WGD:
Whole-genome duplication
Hi-C:
High-throughput chromosome conformation capture
BUSCO:
Benchmarking Universal Single-Copy Orthologs
CEGMA:
The core eukaryotic genes mapping approach assessment
CEGs:
Core eukaryotic genes
LTR-RTs:
Long terminal repeat retrotransposons
SINE:
Short interspersed nuclear elements
CDSs:
Protein-coding genes
KEGG:
Kyoto Encyclopedia of Genes and Genomes
CI:
Confidence interval
GLMM:
Generalized linear mixed-effects model
AS:
Alternative splicing
Dahlgren R, Thorne RF. The order myrtales: circumscription, variation, and relationships. Ann Mo Bot Gard. 1984;71(3):633–99.
The Angiosperm Phylogeny G, Chase MW, Christenhusz MJM, Fay MF, Byng JW, Judd WS, Soltis DE, Mabberley DJ, Sennikov AN, Soltis PS, et al. An update of the angiosperm phylogeny group classification for the orders and families of flowering plants: APG IV. Bot J Linn Soc. 2016;181(1):1–20.
Yoshida T, Amakura Y, Yoshimura M. Structural features and biological properties of ellagitannins in some plant families of the order Myrtales. Int J Mol Sci Vol. 2010;11:79–106.
He Z, Feng X, Chen Q, Li L, Li S, Han K, Guo Z, Wang J, Liu M, Shi C, et al. Evolution of coastal forests based on a full set of Mangrove genomes. Nat Ecol Evol. 2022;6(6):738–49.
Hao Y, Zhou Y-Z, Chen B, Chen G-Z, Wen Z-Y, Zhang D, Sun W-H, Liu D-K, Huang J, Chen J-L. The Melastoma dodecandrum genome and the evolution of Myrtales. J Genet Genomics. 2022;49(2):120–31.
Sytsma KJ, Litt A, Zjhra ML, Chris Pires J, Nepokroeff M, Conti E, Walker J, Wilson PG. Clades, clocks, and continents: historical and biogeographical analysis of myrtaceae, vochysiaceae, and relatives in the Southern hemisphere. Int J Plant Sci. 2004;165(S4):S85–105.
Berger BA, Kriebel R, Spalink D, Sytsma KJ. Divergence times, historical biogeography, and shifts in speciation rates of Myrtales. Mol Phylogen Evol. 2016;95:116–36.
Zhang X-F, Landis JB, Wang H-X, Zhu Z-X, Wang H-F. Comparative analysis of Chloroplast genome structure and molecular dating in Myrtales. BMC Plant Biol. 2021;21(1):219.
Conti E, Litt A, Wilson PG, Graham SA, Briggs BG, Johnson LAS, Sytsma KJ. Interfamilial relationships in myrtales: molecular phylogeny and patterns of morphological evolution. Syst Bot. 1997;22(4):629–47.
Maurin O, Anest A, Bellot S, Biffin E, Brewer G, Charles-Dominique T, Cowan RS, Dodsworth S, Epitawalage N, Gallego B, et al. A nuclear phylogenomic study of the angiosperm order myrtales, exploring the potential and limitations of the universal Angiosperms353 probe set. Am J Bot. 2021;108(7):1087–111.
Cheng L, Li M, Han Q, Qiao Z, Hao Y, Balbuena TS, Zhao Y. Phylogenomics resolves the phylogeny of Theaceae by using low-copy and multi-copy nuclear gene makers and uncovers a fast radiation event contributing to tea plants diversity. Biology. 2022;11:1007.
Zhang N, Zeng L, Shan H, Ma H. Highly conserved low-copy nuclear genes as effective markers for phylogenetic analyses in angiosperms. New Phytol. 2012;195(4):923–37.
Zeng L, Zhang Q, Sun R, Kong H, Zhang N, Ma H. Resolution of deep angiosperm phylogeny using conserved nuclear genes and estimates of early divergence times. Nat Commun. 2014;5(1):4956.
Zhang G, Ma H. Nuclear phylogenomics of angiosperms and insights into their relationships and evolution. J Integr Plant Biol. 2024;66(3):546–78.
Baum DA, Sytsma KJ, Hoch PC. A phylogenetic analysis of Epilobium (Onagraceae) based on nuclear ribosomal DNA sequences. Syst Bot. 1994;19(3):363–88.
Landis JB, Soltis DE, Li Z, Marx HE, Barker MS, Tank DC, Soltis PS. Impact of whole-genome duplication events on diversification rates in angiosperms. Am J Bot. 2018;105(3):348–63.
Otto SP. The evolutionary consequences of polyploidy. Cell. 2007;131(3):452–62.
Soltis PS, Soltis DE. Ancient WGD events as drivers of key innovations in angiosperms. Curr Opin Plant Biol. 2016;30:159–65.
Bowers JE, Chapman BA, Rong J, Paterson AH. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. 2003;422(6930):433–8.
Rieseberg LH. Chromosomal rearrangements and speciation. Trends Ecol Evol. 2001;16(7):351–8.
Adams KL, Wendel JF. Polyploidy and genome evolution in plants. Curr Opin Plant Biol. 2005;8(2):135–41.
Van de Peer Y, Mizrachi E, Marchal K. The evolutionary significance of polyploidy. Nat Rev Genet. 2017;18(7):411–24.
Stebbins GL. Chromosomal evolution in higher plants. London: Edward Arnold LTD; 1971.
Raven PH. Generic and sectional delimitation in onagraceae, tribe epilobieae. Ann Mo Bot Gard. 1976;63:326–40.
Manni M, Berkeley MR, Seppey M, Simao FA, Zdobnov EM. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 2021;38(10):4647–54.
Parra G, Bradnam K, Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic Genornes. Bioinformatics. 2007;23(9):1061–7.
Feng X, Chen QP, Wu WH, Wang JX, Li GH, Xu SH, Shao S, Liu M, Zhong CR, Wu C-I, et al. Genomic evidence for rediploidization and adaptive evolution following the whole-genome triplication. Nat Commun. 2024;15(1):1635.
Glick L, Mayrose I. ChromEvol: assessing the pattern of chromosome number evolution and the inference of polyploidy along a phylogeny. Mol Biol Evol. 2014;31(7):1914–22.
Álvarez I, Costa A, Feliner GN. Selecting single-copy nuclear genes for plant phylogenetics: A preliminary analysis for the senecioneae (Asteraceae). J Mol Evol. 2008;66(3):276–91.
Zimmer EA, Wen J. Using nuclear gene data for plant phylogenetics: progress and prospects II. Next-gen approaches. J Syst Evol. 2015;53(5):371–9.
Xiang Y, Huang C-H, Hu Y, Wen J, Li S, Yi T, Chen H, Xiang J, Ma H. Evolution of Rosaceae fruit types based on nuclear phylogeny in the context of geological times and genome duplication. Mol Biol Evol. 2017;34(2):262–81.
Huang C-H, Sun R, Hu Y, Zeng L, Zhang N, Cai L, Zhang Q, Koch MA, Al-Shehbaz I, Edger PP, et al. Resolution of Brassicaceae phylogeny using nuclear genes uncovers nested radiations and supports convergent morphological evolution. Mol Biol Evol. 2016;33(2):394–412.
Messeder JVS, Carlo TA, Zhang G, Tovar JD, Arana C, Huang J, Huang C-H, Ma H. A highly resolved nuclear phylogeny uncovers strong phylogenetic conservatism and correlated evolution of fruit color and size in Solanum L. New Phytol. 2024;243(2):765–80.
Liu L, Wu S, Yu L. Coalescent methods for estimating species trees from phylogenomic data. J Syst Evol. 2015;53(5):380–90.
Liu L, Xi Z, Wu S, Davis CC, Edwards SV. Estimating phylogenetic trees from genome-scale data. Ann N Y Acad Sci. 2015;1360(1):36–53.
Leebens-Mack JH, Barker MS, Carpenter EJ, Deyholos MK, Gitzendanner MA, Graham SW, Grosse I, Li Z, Melkonian M, Mirarab S, et al. One thousand plant transcriptomes and the phylogenomics of green plants. Nature. 2019;574(7780):679–85.
Yang L, Su D, Chang X, Foster CSP, Sun L, Huang C-H, Zhou X, Zeng L, Ma H, Zhong B. Phylogenomic insights into deep phylogeny of angiosperms based on broad nuclear gene sampling. Plant Commun. 2020;1(2):100027.
Zhao L, Li X, Zhang N, Zhang S-D, Yi T-S, Ma H, Guo Z-H, Li D-Z. Phylogenomic analyses of large-scale nuclear genes provide new insights into the evolutionary relationships within the Rosids. Mol Phylogen Evol. 2016;105:166–76.
Magallón S, Gómez-Acevedo S, Sánchez-Reyes LL, Hernández-Hernández T. A metacalibrated time-tree documents the early rise of flowering plant phylogenetic diversity. New Phytol. 2015;207(2):437–53.
Kumar S, Stecher G, Suleski M, Hedges SB. TimeTree: a resource for timelines, timetrees, and divergence times. Mol Biol Evol. 2017;34(7):1812–9.
Sabara HA, Kron P, Husband BC. Cytotype coexistence leads to triploid hybrid production in a diploid-tetraploid contact zone of Chamerion angustifolium (Onagraceae). Am J Bot. 2013;100(5):962–70.
Seavey SR. Experimental hybridization and chromosome homologies in Boisduvalia sect. Boisduvalia (Onagraceae). Syst Bot. 1992;17(1):84–90.
Faria R, Navarro A. Chromosomal speciation revisited: rearranging theory with pieces of evidence. Trends Ecol Evol. 2010;25(11):660–9.
Otto SP, Whitton J. Polyploid incidence and evolution. Annu Rev Genet. 2000;34:401–37.
Seavey SR, Raven PH. Chromosomal evolution in Epilobium sect. Epilobium (Onagraceae), II. Plant Syst Evol. 1977;128(3/4):195–200.
Seavey SR, Raven PH. Chromosomal evolution in Epilobium sect. Epilobium (Onagraceae). Plant Syst Evol. 1977;127:107–19.
Seavey SR, Raven PH. Chromosomal evolution in Epilobium sect. Epilobium (Onagraceae), III. Plant Syst Evol. 1978;130:79–83.
Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 2008;40(12):1413–5.
Matlin AJ, Clark F, Smith CWJ. Understanding alternative splicing: towards a cellular code. Nat Rev Mol Cell Biol. 2005;6(5):386–98.
Blencowe BJ. Alternative splicing: new insights from global analyses. Cell. 2006;126(1):37–47.
Slotte T, Huang H-R, Holm K, Ceplitis A, Onge KS, Chen J, Lagercrantz U, Lascoux M. Splicing variation at a FLOWERING LOCUS C homeolog is associated with flowering time variation in the tetraploid capsella bursa-pastoris. Genetics. 2009;183(1):337–45.
Staiger D, Brown JWS. Alternative splicing at the intersection of biological timing, development, and stress responses. Plant Cell. 2013;25(10):3640–56.
Santner A, Estelle M. Recent advances and emerging trends in plant hormone signalling. Nature. 2009;459(7250):1071–8.
Wang X, Deng Y, Gao L, Kong F, Shen G, Duan B, Wang Z, Dai M, Han Z. Series-temporal transcriptome profiling of cotton reveals the response mechanism of phosphatidylinositol signaling system in the early stage of drought stress. Genomics. 2022;114(5):110465.
Yonekura-Sakakibara K, Saito K. Function, Structure, and Evolution of Flavonoid Glycosyltransferases in Plants. In: Recent Advances in Polyphenol Research. Edited by A. Romani, V. Lattanzio, Quideau S, vol. 4. New York: John Wiley & Sons, Ltd; 2014: 61–82.
Averett JE, Raven PH. Flavonoids of Onagraceae. Ann Mo Bot Gard. 1984;71(1):30–4.
Dixon RA, Pasinetti GM. Flavonoids and isoflavonoids: from plant biology to agriculture and neuroscience. Plant Physiol. 2010;154(2):453–7.
Murthy PPN. Inositol Phosphates and Their Metabolism in Plants. In: myo-Inositol Phosphates, Phosphoinositides, and Signal Transduction. Edited by Biswas BB, Biswas S. Boston, MA: Springer US; 1996: 227–255.
Cridland C, Gillaspy G. Inositol pyrophosphate pathways and mechanisms: what can we learn from plants? Molecules. 2020;25(12):2789.
He J, Lin S, Yu Z, Song A, Guan Z, Fang W, Chen S, Zhang F, Jiang J, Chen F, et al. Identification of 5S and 45S rDNA sites in Chrysanthemum species by using oligonucleotide fluorescence in situ hybridization (Oligo-FISH). Mol Biol Rep. 2021;48(1):21–31.
Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, et al. SOAPdenovo2: an empirically improved memory-efficient short-read de Novo assembler. Gigascience. 2012;1(1):18.
Marçais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27(6):764–70.
Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de Novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18(2):170–5.
Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326(5950):289–93.
Zhang XT, Zhang SC, Zhao Q, Ming R, Tang HB. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat Plants. 2019;5(8):833–45.
Durand NC, Robinson JT, Shamim MS, Machol I, Mesirov JP, Lander ES, Aiden EL. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 2016;3(1):99–101.
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60.
Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35(suppl2):W265–8.
Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, Smit AF. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA. 2020;117(17):9451–7.
Price AL, Jones NC, Pevzner PA. De Novo identification of repeat families in large genomes. Bioinformatics. 2005;21(suppl1):i351–8.
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110(1–4):462–7.
Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27(2):573–80.
Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B. AUGUSTUS: Ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006;34(suppl2):W435–9.
Majoros WH, Pertea M, Salzberg SL. TigrScan and glimmerhmm: two open source Ab initio eukaryotic gene-finders. Bioinformatics. 2004;20(16):2878–9.
Parra G, Blanco E, Guigo R. GeneID in Drosophila. Genome Res. 2000;10(4):511–5.
Johnson AD, Handsaker RE, Pulit SL, Nizzari MM, O’Donnell CJ, de Bakker PIW. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics. 2008;24(24):2938–9.
Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997;268(1):78–94.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10.
Birney E, Clamp M, Durbin R. GeneWise and genomewise. Genome Res. 2004;14(5):988–95.
Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell CR, Wortman JR. Automated eukaryotic gene structure annotation using evidencemodeler and the program to assemble spliced alignments. Genome Biol. 2008;9(1):1–22.
Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31(19):5654–66.
Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25(5):955–64.
Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. Bioinformatics. 2009;25(10):1335–7.
Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20(1):238.
Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–80.
Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. TrimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25(15):1972–3.
Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A. RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics. 2019;35(21):4453–5.
Zhang C, Mirarab S. Weighting by gene tree uncertainty improves accuracy of quartet-based species trees. Mol Biol Evol. 2022;39(12):msac215.
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–3.
Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24(8):1586–91.
Morris JL, Puttick MN, Clark JW, Edwards D, Kenrick P, Pressel S, Wellman CH, Yang Z, Schneider H, Donoghue PCJ. The timescale of early land plant evolution. Proc Natl Acad Sci USA. 2018;115(10):E2274–83.
Graham SA. Fossil records in the Lythraceae. Bot Rev. 2013;79(1):48–145.
Mendes FK, Vanderpool D, Fulton B, Hahn MW. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics. 2021;36(22–23):5516–8.
Sun PC, Jiao BB, Yang YZ, Shan LX, Li T, Li X, Xi Z, Wang X, Liu J. WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol Plant. 2022;15(12):1841–51.
Rice A, Glick L, Abadi S, Einhorn M, Kopelman NM, Salman-Minkov A, Mayzel J, Chay O, Mayrose I. The chromosome counts database (CCDB) - a community resource of plant chromosome numbers. New Phytol. 2015;206(1):19–26.
Tang H, Krishnakumar V, Zeng X, Xu Z, Taranto A, Lomas JS, Zhang Y, Huang Y, Wang Y, Yim WC, et al. JCVI: A versatile toolkit for comparative genomics analysis. iMeta. 2024;3(4):e211.
Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. J Stat Softw. 2015;67(1):1–48.
© 2025. This work is licensed under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.