In the context of intensifying global change, there is a growing need for broad-scale monitoring strategies and ecosystem assessment (Cardinale et al., 2012; Cordier et al., 2020). Approaches based on environmental DNA (eDNA), broadly defined as the total pool of DNA that can be isolated from the environment (Pawlowski, 2020; Rodriguez-Ezpeleta et al., 2021; Taberlet et al., 2012), represent high-throughput, cost-effective, non-invasive tools that are being increasingly used in biodiversity monitoring programs (Bohmann et al., 2014; Deiner et al., 2017). One of the most common methods to interpret the eDNA signal from a complex community is metabarcoding, which allows for multiple taxa to be investigated in a single sequencing experiment based on short marker genes (Hajibabaei et al., 2011; Taberlet et al., 2012). This approach has led to numerous successful biodiversity assessments of terrestrial and aquatic biota, including metazoans (e.g., Deiner et al., 2017; Hänfling et al., 2016; Sigsgaard et al., 2016; Taberlet et al., 2018), and has the potential to help us gain a more holistic view of an ecosystem with hundreds of organisms identified simultaneously from a single environmental sample. Metabarcoding is a highly sensitive approach that can detect rare or cryptic species (Port et al., 2016; Thomsen et al., 2012), and is seen as a promising approach in assessment studies of aquatic ecosystems (Aylagas et al., 2016; Cordier et al., 2017; Yang & Zhang, 2020). Nevertheless, metabarcoding as well as other PCR-based techniques, such as quantitative PCR, introduce biases. For example, universal primers used to barcode multiple groups of taxa simultaneously do not necessarily bind equally efficiently to different templates, leading to amplification bias or the complete lack of detection of certain groups (Alberdi et al., 2018; Kelly et al., 2019; Tedersoo et al., 2015).
In contrast to the PCR-based approach of metabarcoding, metagenomics is the application of shotgun sequencing technologies to study the entire DNA pool of all organisms present in an environmental sample, without targeting a specific gene marker (Tringe & Rubin, 2005). Metagenomics is a powerful tool for studying the vast prokaryotic diversity in aquatic and other ecosystems (Grossart et al., 2020). However, metagenomics has only recently gained traction in the study of eukaryotes including macro-organisms such as metazoans, and is now seen as a complement (Singer et al., 2020) and a potential alternative to metabarcoding. There are several reasons that might explain the low number of studies using metagenomics to investigate eukaryotes (Barnes & Turner, 2016). First, the capacity of metagenomics to capture the macro-eukaryote signal is not well resolved. Generally, it is believed that macro-eukaryotes are present in much lower densities in the environment compared to microbes (Azam & Malfatti, 2007), which might limit the recovery of the macro-eukaryote DNA signal. Second, issues related to the large size and low coding density of eukaryote nuclear genomes may contribute to limitations in assessing their diversity in metagenomes. Nuclear genomes of eukaryotes contain many repetitive elements that are difficult to assemble into scaffolds, as well as long non-coding sequences which are generally less taxonomically informative (Bik et al., 2012). Abundance estimations of eukaryotes based on metagenomics are further complicated by the high interspecific variability in the number of nuclear-encoded rRNA gene copies (Bik et al., 2012). Nevertheless, recent studies have reported accurate estimations of eukaryotic taxa and, to some extent, their relative abundance using shotgun metagenomic approaches on mock communities (Ji et al., 2019; Garrido-Sanz et al., 2020). It is worth noting, however, that these are based on taxa for which reference genomes/mitogenomes are available. Both micro- and macro-organisms often lack representative genomes in reference databases, making taxonomic assignment challenging. Curated DNA reference databases are however growing rapidly, and thus it is expected that such limitations will continue to decrease in the near future (e.g., Lewin et al., 2018).
Despite the challenges, a handful of studies have shown promising results in applying metagenomics for biodiversity assessments of metazoans in aquatic ecosystems (e.g., Cowart et al. (2018); Singer et al. (2020); Machida et al. (2021); Manu and Govindhaswamy (2021)) and sediments (e.g., Pedersen et al. (2016); Gelabert et al. (2021)). Metagenomics targeting metazoans, however, requires adapting bioinformatics pipelines to accommodate the diluted metazoan signal in eDNA, especially when targeting rare organisms. For instance, in microbial metagenomic studies, reads are typically assembled and binned into metagenome-assembled genomes before annotation. Analyses based on assembled metagenomes however are not always feasible when working with extra-organismal or extra-cellular DNA, likely due to the degraded nature and limited amount of starting genetic material (Barnes & Turner, 2016). An alternative to assembly is to annotate unassembled metagenomes directly by mapping reads to reference databases of nucleotide or protein sequences. Although computationally intensive, this approach has been reported effective when taxonomy is assigned using a Last Common Ancestor (LCA) algorithm or in combination with compositional interpolated Markov models (Quince et al., 2017). Given the relatively nascent nature of this field, we chose to compare the differences in diversity detected via metagenomics by applying alternative workflows: one that is focused on targeting a taxonomically informative gene marker, the 18S rRNA gene in eukaryotes (hereafter referred to as the SSU rRNA gene prediction approach) within the metagenomic data, and the second approach that is a broader analysis of the tens of millions of metagenomic reads (hereafter referred to as the whole metagenome approach).
Here, we provide new insight into the effectiveness and reliability of metagenomics applied to eDNA in lake water samples for describing freshwater zooplankton. Our main questions are: i) can we effectively detect zooplankton diversity in lake water metagenomes, ii) how does the metagenomic gene prediction approach based on a single taxonomic marker (SSU rRNA gene) perform compared to mapping the entire eukaryotic fraction of metagenome reads, and iii) do diversity metrics derived from metagenomes show similar responses to key environmental gradients as those detected with morphological taxonomic surveys? We assessed zooplankton (Cladocera, Copepoda, and Rotifera) diversity based on surface water metagenomes from 22 lakes in eastern Canada and compared these results with zooplankton data from morphologically identified samples collected in net hauls from the same sites. We set our comparative study into an ecological context using a suite of commonly used aquatic parameters (i.e., total phosphorus [TP], specific conductivity, lake depth, and an index of human impact [HII] in the watershed) to determine whether both data types (metagenomics and microscopy) yielded similar patterns. Our study is a timely response to the growing interest in adapting metagenomics techniques for advancing a holistic perspective of aquatic food webs across all domains of life from a single environmental snapshot.
METHODS Sites descriptionThe 22 lakes were sampled as part of the Natural Sciences and Engineering Research Council of Canada (NSERC) Canadian Lake Pulse Network campaign in 2017 during the time of maximum summer lake stratification (between July and September) to minimize seasonal variability (Huot et al., 2019) (Figure 1, Table S1). Lakes were selected across a range of surface area and human impact categories in eastern Canada. Sampled lakes spanned a range of morphological characteristics and trophic states, as summarized in Table S1. Sampling occurred in the euphotic zone at a station situated at the maximum depth of each lake as measured on site using a sonar. Basic environmental variables used in the present study were sampled using standard protocols which are detailed in the LakePulse field protocol (NSERC Canadian Lake Pulse Network, 2021). Epilimnetic TP concentrations were measured over an integrated sample of the water column (Griffiths et al., 2021) and conductivity was measured by averaging the values over the sampling depth as obtained by a multiparameter probe (RBR Maestro 3 profile, RBR Ltd.) (Kraemer et al., 2020). A human impact index (HII) was derived within each watershed based on the proportions of different land use and land cover. Each pixel within a given watershed was assigned a human impact value between 0 and 1 using the following cut-offs: (urban development/road: 1; mines/oil: 1, agriculture: 1, pasture: 0.5, forestry: 0.5, and natural landscapes: 0). The average HII for the lake was then calculated across the watershed (Huot et al., 2019).
Sampling, taxonomic identification, andCrustacean zooplankton was sampled over the depth of the water column from 1 m above the sediment up to the water surface using a Wisconsin net with 100 μm mesh (10 cm net radius and 100 cm length). For relatively shallow lakes (<6 m-deep), additional vertical hauls were taken in the same manner to increase sample volume. Crustacean zooplankton was anesthetized with CO2 (Alka-Seltzer) and preserved in 70% ethanol (approx. final concentration) at room temperature. Species-level identification of crustacean zooplankton was done with a dissecting microscope under 100×–400× magnification by BSA Environmental Services (Ohio, USA). Species biomass was estimated following the method from McCauley (1984). A detailed identification protocol is available in Paquette et al. (2021).
Rotifer counts were done on Lugol-preserved tow-haul samples collected in the same manner as the crustacean zooplankton samples (above), except that instead of sampling from 1 m above the sediment to the lake surface, the rotifer samples were collected from the euphotic zone only. In several instances, the euphotic zone is identical to max depth minus 1 m. The coherence between the original cladoceran zooplankton counts and the rotifer counts performed on a different set of samples was verified by counting Bosminidae individuals in both sample types, to confirm that the preserved samples for rotifer counting were representative of the original zooplankton samples (File S1; Figure S1).
EnvironmentalWater for eDNA was collected at the same station as the net hauls with an acid-washed integrated depth sampler over the euphotic zone down to 2 m below the surface. Our eDNA sampling strategy aimed at targeting mainly extra-organismal DNA, that is, DNA that is not contained within whole organisms (Rodriguez-Ezpeleta et al., 2021), sometimes also referred to as ‘extracellular DNA’ (Bohmann et al., 2014; Taberlet et al., 2012). Thus, for samples dedicated to eDNA analysis, water was first passed through a 100 μm nylon mesh to remove large cells and debris, including most zooplankton groups, which typically range from 50 to 2000 μm in length for Rotifers (Wallace & Snell, 2010) and from >153 to 2000 μm in length for crustacean zooplankton, with the exception of nauplii and juvenile stages which can measure <100 μm in length. Up to 500 ml of water was vacuum-filtered on a Durapore 0.22 μm membrane (Sigma–Aldrich) through a glass funnel apparatus at a maximum pressure of 8 inHg until the filter clogged. Filtrations were done on site in a tent, and two field replicates were collected. Filters were preserved immediately thereafter in cryovials at −80°C until analysis. Caution was taken to limit foreign DNA contamination in the field. A single lake was sampled per day to avoid cross-contamination. All equipment was acid-washed between lakes, and gloves were worn during sampling and filtering.
In the laboratory, DNA was extracted from one of the two filters collected at each lake using the DNeasy PowerWater kit (QIAGEN) following the manufacturer's protocol with the addition of two optional steps as detailed by Garner et al. (2020). All laboratory procedures were performed following strict protocols at dedicated workstations to prevent contamination, but blanks were not collected in the field nor processed along with the samples during the DNA extractions (which is a limitation of the study). PCR blanks were performed with a 16S rRNA gene primer set for a complementary analysis of the same DNA extracts and this revealed no to minimal contamination (Kraemer et al., 2020). DNA quality was visually assessed on a gel and DNA was quantified using a Qubit 2.0 fluorometer and the dsDNA BR Assay kit (Invitrogen). An aliquot of each normalized DNA extract (unamplified) was sent to Genome Quebec facilities (Montreal, Canada) for shotgun library preparation and 150 bp paired-end sequencing on an Illumina NovaSeq 6000 with flow cell type S2.
Metagenomic analysis pipelinesWe applied two separate bioinformatic workflows on unassembled reads to identify eukaryote sequences in the 22 lake metagenomes (Figure 2 and File S2). Given the current lack of consensus in the bioinformatic workflows applied to eukaryotes, we opted to run two workflows that differ in their approaches and computational requirements to identify the amount of variation these would generate in taxa recovery.
FIGURE 2. Detailed bioinformatics workflow used for metagenomic analyses on shotgun sequencing data: (a) steps for the whole metagenome approach and (b) steps for the gene prediction metagenomic approach
In the whole metagenome approach, raw demultiplexed shotgun sequence files were quality checked using FastQC v.0.11.15. and adapter trimming and quality filtering were done with Trimmomatic v.0.38 (Bolger et al., 2014) with a minimum average quality of 25 and a minimum length of 36 nucleotides. The cleaned shotgun paired-end sequences were merged using PEAR (Zhang et al., 2014) before they were aligned against a local database consisting of all Eukarya entries in the NCBI non-redundant nucleotide database with the following parameters: min. e-value 0.001, min. Percentage identity = 70%, and retaining a maximum of 30 hits per read. Due to computational constraints, it is currently not possible to utilize all best BLAST hits in the LCA analysis with this approach. We systematically investigated differences in the threshold of hits used for LCA assignment (30 compared to 100 hits for two lakes) and found no significant differences in the taxonomic assignment and the number of reads obtained for each taxonomic group (Pearson correlation coefficient >0.99 for both lakes; Figure S2), increasing our confidence in this approach. The only notable differences between max. 30 and max. 100 hits were observed for the taxa at very low abundances (number of reads <3). These tended to disappear at max. Hits = 100, which confirmed our original doubts about the validity of taxonomic assignments for groups with an extremely low number of reads. The BLASTn output files were then imported in MEGAN6 v.6.20.17 (Huson et al., 2016) for taxonomic assignment based on the LCA algorithm with a minimum score of 80, a minimum similarity of 80%, a minimum support value of 2, and a minimum complexity filter set at 0.1. A detailed bioinformatic workflow is available as supplementary material (File S2).
In the SSU rRNA gene prediction approach (corresponding to 18S rRNA genes present in all metazoans), we applied the results of the “Raw reads analysis pipeline” of the European Bioinformatics Institute (EBI) MGnify (Mitchell et al., 2020). The detailed pipeline is described on the EBI website (
Diversity analyses based on zooplankton assemblages surveyed using both microscopy and metagenomics were conducted in R v.4.1.0. (R Core Team, 2020). All diversity indices were calculated on assemblages binned to the family rank to deal with uneven taxonomic assignment resolution for different zooplankton groups across analytical platforms. The most common diversity metrics (taxonomic richness, Shannon index, Pielou's evenness) were estimated on zooplankton abundance data (i.e., the number of individuals per liter or the number of sequencing reads per metagenome assigned to freshwater zooplankton taxa) using the diversity function of the package vegan (Oksanen et al., 2013). These metrics were then used in least-square regressions against key environmental gradients identified from an earlier analysis of eastern Canadian LakePulse sites (Griffiths et al., 2021): epilimnetic TP concentration, specific conductivity, lake depth, and an index of watershed disturbance calculated as the HII (Huot et al., 2019). None of the specimens count or read data was normalized for the diversity estimations, but data were transformed to relative abundances prior to the principal component analyses (PCA) and the RV coefficient calculations (as detailed below) to down-weight the effect of rare taxa and the frequent occurrence of double-zeros. All environmental variables were log10 transformed, except for HII values (percentages) that were arcsine transformed.
Principal component analyses (PCA) were performed separately for each dataset using the function prcomp on both log10 and Hellinger-transformed abundance (i.e., the number of individuals per liter or the number of reads sequenced) and biomass data where data were available (i.e., only for crustacean zooplankton observations) (Legendre & Gallagher, 2001). The three main principal components were extracted and used to derive an RV coefficient, analogous to Pearson's correlation coefficient for two given multivariate data matrices (Legendre & Birks, 2012). All possible pairwise comparisons between datasets were explored—densities or biomass vs. either predicted SSU rRNA genes or whole metagenomes, and predicted SSU rRNA genes vs. whole metagenomes. Coefficient significance was verified with the function coeffRV in FactoMineR (Lê et al., 2008). We also considered the congruence between community identifications done for each sample using morphological data and shotgun analyses by calculating pairwise Jaccard and Bray-Curtis dissimilarities (the former based on incidence data and the latter based on relative abundance data (number of individuals per liter or number of reads per metagenome) using the function vegdist in vegan (Oksanen et al., 2013). For this analysis, no biomass data were used.
RESULTS Zooplankton composition and diversity across analytical platformsBased on the microscopy analyses, we detected an average zooplankton family-level richness of 11.1 across the 18 lakes with complete zooplankton counts (Table 1; rotifer data were missing for three lakes). The most dominant families in terms of microscopy specimen counts were the Bosminidae, Cyclopidae, and Daphniidae, whereas the dominant families in terms of biomass (crustacean zooplankton only) were Daphniidae, Cyclopidae, and Diaptomidae. Across the 22 sites, the crustacean community was relatively even based on abundance data, with a Pielou's evenness index of 0.67 (0 = no evenness and 1 = complete evenness). Considering just the crustacean zooplankton families for which there is a larger data set for hundreds of lakes across eastern Canada (Paquette et al., 2021), we found comparable levels of richness and evenness within the range of the key environmental gradients captured by our 22 sites (Table 1).
TABLE 1 Summary of the main diversity indices (mean [min–Max]) estimated for the three survey approaches at the family taxonomic level
Approach | Taxonomic richness | Shannon index | Pielou's evenness |
Microscopy (this study) | 11.1 (7–15) | 1.6 (0.7–1.9) | 0.67 |
Predicted SSU rRNA genes | 5.6 (1–10) | 1.2 (0–1.9) | 0.74 |
Whole metagenome | 14.2 (10–21) | 1.3 (0–2.0) | 0.47 |
Microscopy (198 LakePulse sites) | 6.49 (2–10) | 1.14 (0.1–1.8) | 0.61 |
Note: Microscopy estimates for Shannon and Pielou were based on abundance data (the number of individuals per liter or the number of sequence reads).
Shotgun sequencing yielded on average ~28 million raw reads per metagenome and the number of reads per sample after quality filtering and merging of the pairs varied between ~13 and 23 million (mean = 20 million; Table S2). Overall, the proportion of the merged reads assigned to Eukaryotes ranged consistently between 0.5% and 1.2% of the total paired reads, with between 23% and 46% (mean = 34%) of the eukaryotic reads confidently assigned to metazoans (Table S2). With the whole metagenome BLAST approach, we detected a slightly greater average family richness of 15.95 (Table 1). Relative to the microscopy dataset we found that the assemblages in our 22 lakes were less even (mean Pielou's evenness = 0.47; Table 1). The dominant taxa in terms of the number of reads were Daphniidae, Diaptomidae, and Brachionidae (Rotifera). Using the SSU rRNA gene prediction approach, we detected the lowest average family richness relative to the other two analytical approaches, with a mean of 5.6 (min = 1 and max = 10). The dominant taxa detected were Diaptomidae, Synchaetidae (Rotifera), and Cyclopidae. Comparing the three platforms, we found that the SSU rRNA gene prediction approach yielded the lowest family diversity values but evenness estimates that were closer to those generated through the microscopy counts for zooplankton densities (Table 1, Figure S3).
Congruence of morphological and sequencing zooplankton familiesWe identified a subset of zooplankton families that were present across the predicted SSU rRNA genes, the microscopy, and the whole metagenome datasets (Figure 3a). Nineteen out of 23 families that were detected at most sites using the whole metagenome approach were also found in the microscopy dataset of the 22 lakes. Families that were absent from the microscopy data but present in the metagenomes are taxa that are often characterized as benthic or littoral (i.e., Harpacticidae (copepoda), Chironomidae (Diptera larvae), Adinetidae (Rotifera), and Philodinidae (Rotifera)).
FIGURE 3. (a) Diagram showing the number of zooplankton families and their overlap in detection via the three approaches: Morphology-based microscopy (blue), whole metagenome sequencing (green), and gene prediction approach based on the small subunit (SSU) rRNA subset of reads (orange). (b) Divergent plots showing the number of lakes in which each zooplankton family was detected via the different approaches (gray: microscopy; green: whole-metagenome approach; and orange: predicted SSU rRNA genes approach)
Zooplankton family occurrences across lakes were compared between the three analytical platforms (microscopy and two metagenomics approaches) to determine the level of congruence between survey methods (Figures 3b and 4). The heatmaps (Figure 4) show zooplankton abundance estimates for each type of dataset. Additionally, co-occurrences across the three analytical platforms are shown in the fourth heatmap to show which Family/Order was consistently detected/not detected across datasets at a given site. The families Ergasilidae (copepoda), Leptodoridae (cladocera), and Holopediidae (cladocera) were consistently absent at most sites (found only in a single or a few sites), whereas the Calanoida group (copepods—order level; found at 11 sites), Synchaetidae (rotifer; found at 11 sites), and the Cyclopoida group (copepoda—order level; found at 13 of the 19 sites) were the three taxa that were most consistently widely detected across all analytical platforms (Figure 4). It is worth noting, however, that since the Calanoida and Cyclopoida groups were binned at the order level, they are likely to comprise more than one Family each. The reason for this grouping was twofold: firstly, most of the genetic reference sequences for these clades were lacking finer taxonomic resolution and secondly, these groups include nauplii or juvenile stages which could not be assigned to one or the other order in the microscopy data based on morphological observations only.
FIGURE 4. (a–c) The colored heatmaps show the abundance at the Family level (or Order level for Cyclopoida and Calanoida copepods and for unidentified rotifer families) for each platform tested (whole metagenome, predicted SSU rRNA genes, and microscopy; colored heatmaps). The grayscale heatmap in (d) shows the number of co-detections for each zooplankton group across the three analytical platforms. A value of zero (lightest gray) signifies that a Family/Order was absent from all datasets, whereas a value of three (dark gray) indicates that the Family/Order was consistently found in all three datasets at a given site. The zooplankton groups are ordered alphabetically, and sites are organized numerically. Names of rotifer groups are colored in pink for help with interpretation
When comparing pairwise taxon occurrences across the three datasets for all zooplankton families and with copepods grouped at the order level (Calanoida and Cyclopoida), we found consistent detections in 45% of cases (either 3/3 or 0/3 detections). When comparing microscopy with either metagenomic approach, the overall number of dual positive detections was higher between microscopy and whole metagenome datasets, with a total of 34.3% positive matches across 17 lakes (two lakes missing whole metagenome data were excluded) compared to only 17.3% positive matches for the comparison with predicted SSU rRNA gene data in 19 lakes (Figure 4).
To consider the congruence of the entire assemblage between analytical platforms, we calculated dissimilarity indices. From a taxonomic presence–absence perspective (Jaccard distances), the whole metagenome approach more strongly reflected the community composition based on zooplankton specimen counts (p = 0.0008) than the predicted SSU rRNA gene data. In contrast, when zooplankton family relative abundances (Bray-Curtis dissimilarities) were considered, we found that the predicted SSU rRNA gene data performed similarly to the whole metagenome data based on the ANOVA (p = 0.8051; Figure 5).
FIGURE 5. Boxplots showing Bray-Curtis (left) and Jaccard (right) dissimilarities between the microscopy-based taxonomic composition and the sequence read composition using the whole-metagenome approach (green) and the predicted SSU rRNA gene approach (orange). The significant ANOVA test is indicated with an asterisk
We then performed PCA (based on relative abundance data) to help visualize the patterns of taxon dominance in the three datasets (as described in Section 3.1; Figure S4). To further explore the congruence between assemblages, we calculated the RV coefficient based on the three main PCA axes and found moderate and significant congruence between predicted SSU rRNA genes and morphology density data (Table 2). The congruence was generally stronger when relative abundance data were log10-transformed to account for potential overrepresentation of model organisms in the reference databases, which can yield inflated assigned read estimates compared to other taxonomic groups that are poorly populated in reference databases. When data were Hellinger-transformed, the RV coefficient between the whole metagenome and predicted SSU rRNA gene datasets was also of moderate strength (Table 2).
TABLE 2 Results of the congruence test between datasets (RV coefficients with significance values) using two types of data transformation to relative abundances (Hellinger or log10 + 1) on metagenomic sequence data (WM; whole metagenome approach and SSU; small subunit rRNA gene prediction approach) and microscopy data based on morphological identifications and counts
Matrices compared | Taxa included | Transformation | RV coefficient | p-value |
Density – SSU | All zooplankton | Hellinger | 0.39 | 0.004 |
Density – WM | All zooplankton | Hellinger | 0.006 | 0.93 |
SSU – WM | All zooplankton | Hellinger | 0.38 | 0.003 |
Density – SSU | All zooplankton | Log10 + 1 | 0.48 | 0.0006 |
Density – WM | All zooplankton | Log10 + 1 | 0.12 | 0.39 |
SSU – WM | All zooplankton | Log10 + 1 | 0.17 | 0.19 |
Biomass – SSU | Crustaceans only | Hellinger | 0.21 | 0.12 |
Biomass – WM | Crustaceans only | Hellinger | 0.03 | 0.96 |
Biomass – Density | Crustaceans only | Hellinger | 0.64 | 3.33E-06 |
SSU – WM | Crustaceans only | Hellinger | 0.25 | 0.045 |
Biomass – SSU | Crustaceans only | Log10 + 1 | NS | NS |
Biomass – WM | Crustaceans only | Log10 + 1 | NS | NS |
Biomass – Density | Crustaceans only | Log10 + 1 | 0.51 | 0.0002 |
SSU – WM | Crustaceans only | Log10 + 1 | 0.12 | 0.28 |
Density – SSU | Rotifers only | Hellinger | 0.09 | 0.68 |
Density – WM | Rotifers only | Hellinger | 0.13 | 0.53 |
SSU – WM | Rotifers only | Hellinger | 0.09 | 0.70 |
Density – SSU | Rotifers only | Log10 + 1 | 0.08 | 0.96 |
Density – WM | Rotifers only | Log10 + 1 | 0.10 | 0.66 |
SSU – WM | Rotifers only | Log10 + 1 | 0.02 | 0.96 |
Note: For the morphological data, we estimated the abundance of all zooplankton taxa and the biomass of crustacean zooplankton only. Significant rRV coefficient tests are shown in bold.
Zooplankton diversity patterns over environmental gradientsWe investigated the relationship between family-level zooplankton diversity metrics (richness and Shannon diversity) and major environmental gradients identified across eastern Canada (the HII, TP, specific conductivity, and lake depth). Our analyses showed that several consistent relationships were apparent across the analytical platforms. For richness, significant negative relationships were observed between TP and both metagenomic datasets (SSU: adj. R2 = 0.24, p = 0.02; WM: adj. R2 = 0.29, p = 0.01) (Figure 6 and Table S3). A negative but non-significant relationship (p = 0.24) was observed between TP and richness derived from morphological identifications. We also detected negative relationships between richness and HII, but the fit once again was only significant for the metagenomic datasets (SSU: adj. R2 = 0.23, p = 0.022; WM: adj. R2 = 0.33, p = 0.006; morphology: adj. R2 = 0.07, p = 0.14). Finally, a marginally significant relationship (adj. R2 = 0.12, p = 0.08) was observed between richness derived from the whole metagenome dataset and specific conductivity.
FIGURE 6. Diversity metrics (left: Taxonomic richness, right: Shannon entropy) estimated from microscopy, predicted SSU rRNA genes (SSU), and whole metagenome (WM) datasets plotted against environmental gradients using a generalized additive model (GAM). Environmental data were log10 transformed, except for the HII (expressed as a percentage), which was arcsine transformed. Red and yellow backgrounds identify significant and marginally significant relationships, respectively. Adjusted R-squared and p-values for each GAM are listed in Table S3)
For Shannon diversity, a significant fit was found between the predicted SSU rRNA gene data and TP (adj. R2 = 0.18, p = 0.03) and between predicted SSU rRNA gene data and HII (adj. R2 = 0.15, p = 0.056). The fit between Shannon diversity derived from whole metagenome data and specific conductivity was marginally significant (adj. R2 = 0.11, p = 0.098) and no significant relationships emerged with lake depth (Figure 6 and Table S3).
DISCUSSIONConsistent with other comparative analyses between eDNA metagenomics (i.e., shotgun sequencing) and morphological approaches (Singer et al., 2020; Stat et al., 2017), our results show that even though the match between techniques is not perfect, we were able to detect modest congruence in taxon relative abundance across platforms and varying levels of congruence between analytical platforms when we considered the presence–absence data. Interestingly, diversity metrics across all analytical platforms showed similar responses to epilimnetic phosphorus concentration, which is often considered a limiting nutrient in many lakes in eastern Canada. Many important improvements can be implemented in future metagenomic work to help refine the robustness of this approach applied to metazoan biodiversity eDNA surveys as well as the resolution of the taxonomic identifications which is currently limited by the availability of genomic reference databases (Section 4.3).
To what extent do water metagenomes represent zooplankton biodiversity?Using a variety of statistical approaches, we found that zooplankton communities surveyed using morphological counts and metagenomic analyses were, at best, moderately correlated. While local diversity metrics were similar across platforms, whole metagenome analysis detected the highest richness of zooplankton taxa. It is interesting to note that the congruence between analytical platforms varied with the type of data analyzed (i.e., presence–absence vs. relative abundance data; Figure 5, Table 2). The gene prediction approach based on SSU rRNA was insufficient to detect all taxa present in the microscopy samples at all sites, but the abundances derived from this workflow appeared to be more reflective of the zooplankton specimen counts compared to the whole metagenome approach where some taxonomic groups (e.g., the Daphniidae family) appear to have an inflated number of reads.
It is also informative to compare the strength of our results with other eDNA—morphological comparisons. For example, Keck et al. (2022) conducted a meta-analysis of comparative metabarcoding and morphological studies, and found that eDNA detected significantly more taxa than morphological counts, as eDNA may contain traces of taxa distributed outside of the immediate sampling area. Although no such synthetic analysis has been done from metagenomes, we would expect a similar finding. In our molecular dataset, we found four families of zooplankton that were not recorded as part of the morphological survey, but these taxa are either generally characterized as benthic or littoral associated so may not have been present as individuals in the immediate sampling area at the deepest point in the lake. For instance, the Bdelloid rotifers Adinetidae and Philodinidae, which we only found via the whole metagenome analysis, are typically found to live on plants or debris in waters with dense vegetation and are generally not caught in open-water plankton tows (Wallace & Snell, 2010). Across the metabarcoding and metagenomic literature, many have argued that eDNA approaches are more complementary to morphological approaches rather than directly exchangeable, and the coherence between metagenomic and metabarcoding for eukaryotic diversity surveys needs further detailed investigation (Cordier et al., 2020; Garlapati et al., 2019).
Shotgun sequencing reveals diversity patterns over broad environmental gradientsTo explore diversity patterns over broad environmental gradients and among analytical platforms, taxonomic richness and Shannon diversity metrics were plotted against common environmental drivers in lakes: epilimnetic TP, specific conductivity, lake depth, and HII estimated in the 22 lakes (Huot et al., 2019). Based on our dataset, we found relatively consistent patterns in zooplankton diversity across analytical platforms (Table 2) and in relation to environmental drivers (Figure 6, Table S3), indicating that the shotgun sequencing approach shows promise for investigating ecological gradients in freshwater systems. Our findings are also consistent with results reported by Singer et al. (2020) from a marine system, where despite revealing contrasting taxonomic diversity, both the metagenomic and metabarcoding data revealed similar ecological patterns, which in turn were useful to infer factors related to the ecosystem health and function.
Limitations to the metagenomic approach and prospectsLimitations of eDNA-based approaches have been widely studied, although these have been primarily based on PCR-based approaches. Challenges relate mostly to the availability, heterogeneity, and quality of eDNA itself in water (Goldberg et al., 2016), whereby investigators have identified the conditions contributing to eDNA degradation (Barnes et al., 2014) or the transport of eDNA over long distances (Deiner & Altermatt, 2014). Additionally, human errors during the sequencing library preparation workflow may contribute to variation in sequencing efficiency (e.g., differential efficiency of DNA extraction, pipetting errors during DNA quantification, and sequencing library preparation). Although our knowledge of these factors is constantly improving the robustness of eDNA research, there are also other aspects of the workflow—from sampling strategies to bioinformatics—which need to be improved to strengthen metagenomic approaches applied to the study of metazoans in the environment.
Methodological considerationsOverall, we are looking at divergent sampling and data processing efforts across approaches, which may have brought about some of the differences in zooplankton diversity assessed using morphological and eDNA methods. Our metagenomes were produced for the primary purpose of examining bacteria and archaea and thus the volume of water filtered was only ~250–500 ml, depending on how much water could be passed through the 0.22 μm membrane before it clogged. In contrast, the morphological identification of zooplankton was performed on tow-haul samples collected from tens of liters across the full water column that was then concentrated to a few hundred milliliters. Furthermore, because the samples for eDNA analysis were collected over the photic zone only, they might not fully represent the rotifer and crustacean zooplankton samples which were collected from below the thermocline and over the full water column, respectively. These factors, as well as the lack of field and laboratory blanks (possibly leading to false positives), the lack of field replication, and the differential sequencing efficiency (possibly leading to substantial variation in the number of raw reads among metagenomes) may have contributed to lower diversity estimates and noise characterizing zooplankton assemblages. However, these biases are likely to mostly affect the detection of rare taxa and do not appear to hinder the estimation of relative abundance in the dominant taxa, which was found to be rather consistent among lakes and analytical platforms in this study and in other metagenomic datasets (Singer et al., 2020). Given the very large environmental gradients captured within the context of our study, lake variation is likely as small as has been shown for other LakePulse network analyses (e.g., Griffiths et al., 2021).
Filter pore size selection could also play an influential role in fractionating zooplankton eDNA and controlling the volume of water filtered. We opted for a size selection step where we first pre-filtered water through a 100 μm mesh, which allowed us to filter larger volumes of water onto the smaller pore-size membrane while selecting mostly for extra-organismal DNA (i.e., DNA that is no longer found within an organism, as opposed to organismal DNA; Rodriguez-Ezpeleta et al., 2021). It is however likely that gametes and other juvenile stages in cladocerans and copepods passed through the 100 μm mesh and got caught on the 0.22 μm membrane, which may have contributed to the inflated number of reads assigned mainly to Daphniidae and copepods in the whole metagenome dataset. Similarly, some rotifers are smaller than 100 μm in size, and therefore selecting a filter with a pore size better suited to our target organisms may lead to higher overall coverage by minimizing the allocation of sequencing effort to DNA from non-target microbial taxa. For instance, 0.45 μm cellulose nitrate filters have been shown to yield consistent results for fish eDNA metabarcoding with high repeatability between filtration replicates (Li et al., 2018). Type of filter, pre-filtration step, and pore size have all been identified as factors determining the final yield of eDNA, with differences observed between taxa and systems (Bowers et al., 2021).
Filtering larger volumes of water combined with an increased sequencing depth may help yield a higher number of reads and slightly more diversity for Eukarya, which are very much underrepresented in metagenomes in contrast with bacteria and archaea. Similar to earlier research, we found that the proportion of recovered eukaryotes tends to be <1% of the total read assignments, either with a genome-wide approach (Cowart et al., 2018; Stat et al., 2017) or a gene-centric approach (Tedersoo et al., 2015). In contrast, the filtration of 10 L of water targeting extracellular DNA combined with ultra-deep sequencing yielded ~100 million reads per metagenome from a brackish lagoon and improved the coverage for Eukarya to a proportion corresponding to over 4% of the total number of reads (Manu & Govindhaswamy, 2021). Combining field replicates would also be advisable, as this would likely increase the recovered diversity and help reduce within lake variability. Other emerging target enrichment techniques such as hybridization capture have great potential to improve the detection of metazoans in metagenomes (Seeber et al., 2019; Sevigny et al., 2021). Hybridization capture utilizes RNA probes designed to bind the gene region of interest, enhancing the signal of desired taxa without introducing PCR-induced biases. Recent results based on ultra-deep sequencing have shown that the coverage for eukaryotes may be improved when combining shotgun sequencing with DNA target-capture methods (Manu & Govindhaswamy, 2021). Alternatively, metatranscriptomics of RNA sequencing data is an emerging and promising approach for characterizing zooplankton communities. A recent study by Lopez et al. (2021) comparing zooplankton estimates from observational with both amplicon sequencing and metatranscriptomics datasets has revealed higher congruence of observational zooplankton abundance and composition with metatranscriptomics estimates compared to amplicons sequencing using genomic (gDNA) and complementary DNA (cDNA) amplicon sequencing.
Bioinformatics considerationsCarefully designed bioinformatics workflows are crucial for robust taxonomic assignment of metagenome reads. Our data suggest that analyzing whole metagenomes captures the widest pool of biodiversity, but narrowing the analysis to taxonomically informative gene markers, such as the SSU rRNA genes in eukaryotes, better reflected the observed relative abundance of zooplankton families based on microscopy. We also found that in using a targeted approach with SSU rRNA genes as taxonomic markers, several taxa were missing or did not get taxonomically assigned using the LCA algorithm, even though they were present in relatively high abundances in microscopy counts. Some taxa were consistently missing or almost absent from our genomic datasets (such as the Bosminidae), despite being one of the most abundant taxonomic groups identified using microscopy. Such incongruences have been reported between microscopy and metabarcoding data (see Keck et al., 2022) as well as in the more scarce body of literature where the congruence between metagenomic and microscopy datasets have been explored (Singer et al., 2020). We hypothesize that a large part of the issue with missing taxa in our metagenomes is caused by the same bioinformatic limitations as in any genetic study: the current lack of exhaustive and curated reference sequence databases limits our ability to assign taxonomy to reads. In molecular genetic datasets, and especially in shotgun sequencing data, a large portion of the reads only gets assigned to Phylum level or lower. These reads are typically filtered out bioinformatically, meaning that they are not considered in the estimation of diversity indices or in comparisons with other datasets. Therefore, we might be widely underestimating the abundance and diversity of certain taxonomic groups which are not populated in reference databases. Here, we opted to bin metagenomic reads at the Family level to retain as many reads as possible, including those that could not be determined at Genus or Species level. This is not as informative ecologically, but with the current state of reference databases for zooplankton, and the limited taxonomic information offered by unassembled short reads (<150 bp in length) this is the best we can achieve at the moment.
Since using the full metagenome read set is computationally intensive and does not appear to yield a higher correlation with morphology-based identifications, a reasonable compromise that might increase coverage without multiplying computation efforts could be the combination of a few targeted genetic markers. For example, metagenome reads mapped against both the SSU and LSU rRNA gene markers have been shown to improve taxa recovery in a study of marine plankton from DNA preserved in marine sediment (Armbrecht, 2020). In addition to nuclear SSU rRNA genes, we investigated the mitochondrial cytochrome c oxidase subunit I (COI) gene using the same prediction approach based on metagenome reads but found the coverage for this marker to be very low for metazoans. For this reason, we did not pursue further metagenomic COI gene prediction analyses.
There is clearly an urgent need for curated nucleotide sequence and genome reference databases to improve the interpretation of eDNA-based datasets. This is especially the case for freshwaters, where monitoring efforts are limited and yet provide habitat for a disproportionate number of taxa per unit area (Strayer & Dudgeon, 2010). Currently, there are insufficient observational data for many taxa such that we cannot even assess the state of at least ~40% of freshwater species in Canada (Desforges et al., 2022). Initiatives to improve sequencing coverage of eukaryotic biodiversity are underway, including the Barcode of Life Data System (BOLD) (Ratnasingham & Hebert, 2007), the International Barcode of Life (IBOL) (
In this study, metagenomics and classical morphological analyses of zooplankton applied to 22 freshwater lakes yielded contrasting abundance estimates but comparable diversity assessments at the family level. Metagenomics detected a greater richness of taxa, including some that generally live outside the pelagic photic zone where the samples were taken, which is to be expected given the persistence and transport of eDNA in nature. Although metagenomic techniques still need to be improved with better-adapted sampling protocols and refined bioinformatics pipelines specific to eukaryotic genomes, our results based on two contrasting workflows for the analysis of aquatic metagenomes suggest the enormous potential for extending such analyses to the investigation of zooplankton and other aquatic and non-aquatic micro- and macro-eukaryotes. Our comparative study contributes to a better understanding of how the metagenomic approach might contribute to biodiversity and ecological assessments in complement to other routinely applied traditional and eDNA surveys.
AUTHOR CONTRIBUTIONMÈM, IGE, MEC, and DAW designed the study. REG and SAK made major contributions to the acquisition and interpretation of the metagenomic data. BEB provided the morphological zooplankton data. MÈM, SAK, and REG performed data analysis and MÈM and REG produced the figures. MÈM wrote the first draft of the manuscript and all authors contributed to editing and revising the manuscript.
ACKNOWLEDGEMENTSWe thank the NSERC-funded LakePulse Network (2017–2019), Yannick Huot for leading LakePulse, Marie-Pierre Varin, and Maxime Fradette for coordinating sampling and lab work. We are grateful to Vera Onana (Concordia University) for her contribution to producing some of the molecular data. We additionally thank Cindy Paquette (UQAM) for providing curated zooplankton identification and count tables, and Michelle Gros (McGill University) for the rotifer identification and help to compile zooplankton data. We also wish to acknowledge the contribution of the high-throughput sequencing platform of the McGill University and Genome Québec Innovation Centre (Montreal, Canada).
CONFLICTS OF INTERESTSThe authors declare no conflict of interest.
DATA AVAILABILITY STATEMENTSequence data were submitted to the EBI metagenomics platform for analysis and archiving under Study MGYS00003941. The raw metagenomic data are stored under accession numbers ERS2799837, ERS2799839, ERS2799840, ERS2799842, ERS2799843, ERS2799847, ERS2799851, ERS2799859, ERS2799875, ERS2799886–ERS2799888, ERS2799890, ERS2799911, ERS2799929, ERS2799930, ERS2799932–ERS2799935, ERS5372506, ERS5372513.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2022. This work is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Molecular genetic approaches applied to environmental DNA have great potential for biodiversity research and ecosystem monitoring. A metagenome is produced via shotgun sequencing of DNA collected directly from the environment and represents a sample of genetic information from all organisms captured in an environmental sample. Metagenomes have been primarily used to study bacteria and archaea, but promising reports focusing on metazoan diversity are emerging. However, methodological uncertainties remain, and studies are required to validate the power and the limitations of such an approach when applied to macro‐eukaryotes. Here, we analyzed water sample metagenomes to estimate zooplankton diversity in 22 freshwater lakes across eastern Canada. We tested the coherence of data based on field samples collected at the same time: 1) morphologically identified zooplankton specimens and 2) molecular genetic data derived from shotgun sequencing of environmental DNA for which we applied two different bioinformatic workflows: a whole metagenome mapping approach and a small subunit (SSU) rRNA gene prediction approach. We further evaluated diversity trends emerging from each dataset in relation to major environmental gradients. We found a significant correlation between the relative abundance of zooplankton families identified based on SSU rRNA gene prediction and morphology. However, differences in congruence between metagenomes and morphological identifications were detected when varied bioinformatic approaches were applied to the presence–absence data. This study presents one of the first diversity assessments of a group of aquatic metazoans using metagenomes and validates the coherence of the community composition derived from genomic and classical species surveys. Overall, our results suggest that metagenomics has the potential to be further developed to describe metazoan biodiversity in aquatic ecosystems, and to advance this area we provide key recommendations for workflow improvement.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details







1 Department of Biology, McGill University, Montreal, QC, Canada; Groupe de Recherche Interuniversitaire en Limnologie (GRIL), Montreal, QC, Canada
2 Groupe de Recherche Interuniversitaire en Limnologie (GRIL), Montreal, QC, Canada; Department of Biology, Concordia University, Montreal, QC, Canada
3 Groupe de Recherche Interuniversitaire en Limnologie (GRIL), Montreal, QC, Canada; Department of Biology, Concordia University, Montreal, QC, Canada; Department of Biological Sciences, University of Quebec at Montreal, Montreal, QC, Canada