INTRODUCTION
The explosion of environmental sequencing data in the last decade has fueled a deeper understanding of the role of microbiomes in shaping human health, ecosystem function, and the Earth’s biogeochemical cycles (1). Further advancements in microbiome science require improved experimental approaches that link genomes to their
Traditional DNA-SIP studies use 16S rRNA gene sequencing to identify labeled microorganisms (7, 8), and several analysis tools are available for DNA-SIP data (9
-
11). In addition to identifying microbial groups as either labeled or unlabeled, analysis tools such as delta BD (ΔBD) (12) and quantitative SIP (qSIP) (11) can also estimate the extent of isotope assimilation as atom fraction excess (AFE), which is the increase in the isotopic composition of DNA above background levels (11). Measurements of AFE can inform
In recent years, multiple SIP studies have used metagenome sequencing in addition to, or in place of, 16S rRNA gene amplicon sequencing (18
-
23). We refer to this general approach as “SIP metagenomics” from here on to distinguish it from DNA-SIP using 16S rRNA genes. Some recent studies have applied the qSIP approach to shotgun sequencing data to estimate the isotopic enrichment of soil metagenome-assembled genomes (MAGs) (24
-
26). While these represent exciting advancements in the field, SIP metagenomics faces several data analysis and interpretation challenges. For example, estimates of isotopic enrichment depend on accurate measurements of absolute genome abundance, but determining genome abundance from metagenomic data is difficult due to its compositional nature (27
-
30). In addition, outstanding questions remain regarding optimal assembly strategies and the specificity and sensitivity of analysis tools, given varying sequencing depth and genome coverage. Empirically answering these questions requires a defined experiment where the identity of labeled genomes and their level of isotopic enrichment is known
Here, we explored SIP metagenomic sample processing and analysis strategies using an environmental microbiome amended with isotopically labeled
RESULTS
To create a ground truth dataset for assessing SIP metagenomics, we generated a microbial community DNA sample where the identity of labeled genomes and their level of enrichment was known
FIG 1
Experimental design and overview of laboratory steps in the SIP metagenomics workflow. To create a defined SIP experimental sample, DNA extracted from an unlabeled freshwater microbial community was amended with either labeled (13C) or unlabeled (12C)
FIG 2
The workflow scheme for SIP metagenomic data analysis includes (A) quality filtering of the raw reads and (B) generation of a unique set of medium- and high-quality MAGs used for (C) quantification of absolute taxa abundances and identification of isotope incorporators. The addition of sequins provides the means for calculating absolute bacterial abundances (C, Data Normalization), and pre-centrifugation spike-ins aid in the detection of anomalous samples (C, Outlier Handling).
To develop an empirically validated workflow for SIP metagenomics, we next created the
Maximizing recovery of metagenome-assembled genomes using individual and combined assemblies
In contrast to a typical metagenome sample, community DNA in an SIP experiment is separated into multiple fractions based on BD prior to sequencing (Fig. 1). Differences in GC content and levels of isotopic enrichment result in a non-random distribution of microbial genomes across the density gradient, and sequencing each density fraction provides multiple options for assembly and binning. To determine the optimal strategy for maximizing MAG recovery, we compared assembly of the intact unfractionated sample, separate assemblies of each individual fraction, co-assemblies of all fractions derived from the same initial samples, and a massive combined assembly using MetaHipMer (33) of all fractions from all samples. The latter three strategies all used the same 1,418 Gbp of sequence data from hundreds of sequencing libraries and grouped in different ways for each strategy, while the unfractionated assembly used only 47 Gbp from one sequencing library. Each assembly was then independently binned using MetaBAT2 (34).
A total of 2,022 MAGs were generated across all assemblies, of which 248 were high quality, 447 were medium quality, and 1,327 were low quality as defined by the minimum information about metagenome-assembled genomes (MIMAG) reporting standards (35) (Data Set S2). The MetaHipMer assembly produced more MAGs than any other strategy. A total of 235 MAGs were recovered from the MetaHipMer assembly, of which 136 were medium- or high quality (Fig. 3A). Estimates of average MAG completeness and contamination for each assembly type were not substantially different (Fig. S1).
FIG 3
Comparison of metagenome assembly approaches for the SIP metagenome dataset generated from spiking
Next, we deduplicated all the medium- and high-quality MAGs recovered from all assemblies to determine whether any approach generated unique MAGs that were not present in other assembly types (Fig. 2B). We first grouped MAGs with average nucleotide identities of ≥96.5 and alignment fractions of ≥30% into a total of 148 unique clusters (36), then selected a single representative MAG for each cluster. Of these, 120 MAG clusters were exclusively produced by MetaHipMer. Twelve MAG clusters did not include any MetaHipMer-generated MAGs, and 11 of these clusters contained at least one MAG generated from the assemblies of individual fractions (Fig. 3B). Assembly of the intact unfractionated microbiome did not produce any unique MAGs (Fig. 3B), presumably because the sequencing depth for the unfractionated sample (47 Gbp) was much smaller than the total sequencing depth of all the individual fraction assemblies, the sample-wise combined assemblies, and the MetaHipMer assembly (1,418 Gbp). The different assembly strategies also produced MAGs with different taxonomic compositions. For example, MAGs derived from the MetaHipMer assembly accounted for an additional nine classes that were not present in other assemblies (e.g.,
Anomalous sample detection using pre-centrifugation spike-in controls
As part of the quality control process, we devised an approach for detecting anomalous samples whose pre-centrifugation spike-in sequences displayed aberrant distributions along the BD gradient (Fig. 2C). We added six synthetic spike-ins to our samples prior to ultracentrifugation, and each spike-in had a different density based either on its GC content or the artificial introduction of 13C-labeled nucleotides during oligo synthesis (Table S2 at https://doi.org/10.6084/m9.figshare.22280632); therefore, each spike-in has a distinct and predictable peak in coverage along the BD gradient. Deviations from the expected spike-in distribution patterns may indicate events such as cross-contamination, library misidentification, or accidental disturbances of the density gradient significant enough to distort the distribution of MAGs throughout the gradient, all of which would introduce error into the downstream analysis. We identified three biological replicates with anomalous spike-in distribution patterns (Fig. S2), and these samples were removed from downstream analyses to avoid the introduction of extraneous noise. This example illustrates the utility of internal standards to illuminate quality control problems in SIP experiments that would otherwise go undetected.
Normalizing genome coverage to quantify DNA isotope incorporation
Accurate abundance measurements are critical for determining levels of isotopic labeling. Briefly, models such as qSIP and ΔBD estimate a taxon’s AFE based on differences between its weighted BD in unlabeled controls and isotope-amended treatments (9, 11, 37), where weighted BD is calculated from the taxon’s abundance within each density fraction (see Materials and Methods, Equations 5 and 6). For amplicon-based qSIP studies, the relative abundance of a taxon is normalized to the total number of 16S rRNA gene sequences within each fraction determined by qPCR (11). Estimating abundance in SIP metagenomic studies is more complicated, since shotgun sequencing lacks an equivalent method to 16S rRNA gene qPCR for absolute abundance scaling. Previous SIP metagenomic studies multiplied relative genome coverage with the total DNA concentration of each fraction (25, 26), which is a reasonable approach, although it does not account for potential variability introduced during DNA recovery, library creation, and sequencing of each fraction (29, 30, 38). By adding sequins to density fractions before DNA precipitation and recovery, we explored an alternative normalization strategy for measuring absolute abundance that could also account for variability in the downstream processing steps (24). In this approach, genome coverage within each fraction can be converted into absolute abundances through normalization based on the known concentration and observed coverage of the sequin internal standards. The AFE of each genome can then be estimated from these abundance measurements.
Our experimental design, where isotopic enrichment levels were known
TABLE 1
Performance of different approaches for calculating genome abundance across density fractions based on the results from spiking 13C-labeled
Method | Procedure | Specificity | Sensitivity | Spearman correlation between estimated and true AFE ( |
---|---|---|---|---|
Absolute abundance using sequins | Regression using sequin coverage and concentration | 0.993 | 0.857 | 0.86 (0.012) |
Absolute abundance using total DNA concentration | Product of relative abundance and DNA concentration (25) | 0.991 | 0.714 | 0.83 (0.021) |
Product of relative coverage and DNA concentration (24) | 0.922 | 0.571 | 0.38 (0.4) | |
Relative coverage | Relative coverage of MAGs in each fraction | 0.999 | 0.571 | 0.77 (0.041) |
AFE was predicted using the qSIP model. Specificity was estimated as (true negatives)/(false positives + true negatives). Sensitivity was estimated as (true positives)/(true positives + false negatives).
FIG 4
Comparison of predicted AFE versus the expected AFE of
Abundance estimates derived from the sequin approach outperformed all other approaches based on combinatorial assessment of specificity (lower false positives), sensitivity (lower false negatives), and the Spearman correlation between expected and predicted AFE values (Fig. 4 and Table 1; Table S3 at https://doi.org/10.6084/m9.figshare.22280632). The two approaches using total DNA concentrations underestimated levels of AFE, and one approach did not produce statistically significant linear regressions (
Comparison of various SIP analysis methods
In addition to qSIP, other analysis methods such as ΔBD (9), HR-SIP (9), and MW-HR-SIP (10) can identify labeled taxa. We compared all four approaches for their ability to accurately identify isotope incorporators in our defined SIP metagenomic dataset. We also compared estimates of
Estimates of
FIG 5
Comparison of AFE estimates produced by the (A) qSIP and (B) ΔBD methods using the amended metagenome where levels of
TABLE 2
Comparison of methods to identify isotopically labeled genomes
Incorporator identification method | False positives | Specificity | Sensitivity | Balanced accuracy |
---|---|---|---|---|
qSIP model | 7 | 0.993 | 0.857 | 0.925 |
ΔBD method | 12 | 0.984 | 0.857 | 0.921 |
HR-SIP | 9 | 0.991 | 0.571 | 0.781 |
MW-HR-SIP | 4 | 0.996 | 0.857 | 0.927 |
Evaluations were based on absolute genome abundances obtained by normalizing coverage to internal sequin standards using the sequin approach. Specificity and sensitivity were averaged across the seven treatment conditions.
Lower limits of genome coverage for reliable detection of isotope labeling
Next, we examined how sequencing depth affected our ability to detect isotope incorporation. As demonstrated above, the accuracy of abundance measurements impacts the accuracy of AFE estimates, and these abundance measurements are derived from genome sequence coverage. The relative abundance of microbial taxa comprising complex communities can vary by orders of magnitude; thus, genome coverage within sequencing libraries can vary similarly (39). This suggests that AFE estimates might be less reliable for taxa with low coverage. To determine the lowest depth of coverage at which an AFE could be accurately estimated, we performed qSIP and MW-HR-SIP analyses after subsampling
The qSIP model consistently identified
Strategies to improve accuracy of detecting isotopically labeled genomes
To improve the accuracy of SIP metagenomic analysis, we explored different strategies to reduce the number of genomes incorrectly identified as labeled (i.e., false positives). For example, the number of false negatives increased as coverage decreased; therefore, we tested whether implementing minimum genome coverage requirements could reduce the number of false positives. Excluding genomes with mean total coverages < 10× reduced the total number of MAGs analyzed from 147 to 113 and reduced false positives identified by qSIP from 7 to 4 without increasing false negatives (Table S7). This improved the balanced accuracy from 0.925 to 0.927. Raising the minimum mean total coverage to 17× eliminated all false positives (Fig. S5), but this also reduced the number of remaining MAGs analyzed to only 68. We did not test coverage limits for MW-HR-SIP because the method struggled to detect
We also investigated if false positives could be reduced by implementing a minimum level of isotopic enrichment necessary for a genome to be considered labeled. That is, rather than simply requiring genomes to be significantly greater than 0% AFE, which is the default setting of the qSIP approach (11), we examined different minimum AFE thresholds ranging from 2% to 12.5% (Table S8 at https://doi.org/10.6084/m9.figshare.22280632). A genome was considered to be labeled if the lower bound of its AFE 95% CI was greater than the minimum AFE threshold. With AFE thresholds between 2% and 6%, total false positives dropped from 7 to 3 across all experimental treatments, but
The number and identity of false positives varied across SIP analysis methods, presumably due to differences in the methods’ underlying algorithms. Therefore, we hypothesized that the number of false positives might be reduced by taking the consensus of different analysis methods, i.e., requiring that two separate models predict a MAG is labeled. All false-positive MAGs found in qSIP analysis were also false positives in ΔBD analysis, thus taking the consensus of these two methods did not produce fewer false positives than qSIP alone (Table S9 at https://doi.org/10.6084/m9.figshare.22280632). In contrast, there was no overlap in the identity of false-positive MAGs between the qSIP and MW-HR-SIP methods, and a union of their results completely eliminated false positives without producing any false negatives (Table S9 at https://doi.org/10.6084/m9.figshare.22280632). However, we found it more advantageous to apply MW-HR-SIP and qSIP sequentially rather than independently. MW-HR-SIP had greater specificity than qSIP; therefore, it was used as a first-pass filter to detect putatively labeled genomes while minimizing false positives. This subset of putatively labeled genomes was then re-analyzed with the qSIP model. Only genomes first identified as labeled by MW-HR-SIP and later confirmed with a significantly positive AFE by qSIP were labeled. Applying the tools in series reduced the number of multiple hypotheses tested (i.e., MAGs tested for enrichment), which subsequently increased the statistical power for AFE estimation. That is, without the initial reduction in identified incorporators, the qSIP analysis would have otherwise included all MAGs in its statistical comparisons between treatment groups, resulting in a smaller
DISCUSSION
DNA-SIP has been an established method in microbial ecology for many years and has primarily relied on 16S rRNA gene sequencing to identify active taxa (4, 5, 11, 15, 18, 40). With decreases in sequencing costs and increases in compute capacity, DNA-SIP studies can now utilize shotgun metagenomic sequencing to establish links between population genomes and
Comparing assembly strategies for SIP metagenomic data was a key goal of our study. Previous SIP studies have used different strategies, including assembling unfractionated DNA, assembling individual SIP fractions, and co-assembling several fractions (24 - 44 - 45 - 45). However, it was not clear which assembly strategy produces the most medium- and high-quality MAGs. For instance, in computationally simulated SIP experiments, the co-assembly of multiple fractions improved MAG recovery compared with the assembly of unfractionated DNA (45). In addition, the large amount of sequence data used in co-assemblies can recover rare genomes that would otherwise be lost due to insufficient coverage in smaller assemblies of individual datasets (33). Conversely, individual assemblies can outperform co-assemblies in samples where high levels of microdiversity impede contig formation (46 - 48). Here, we found that co-assembly of all density fractions generated the most medium- and high-quality MAGs, which agrees with two recent SIP metagenomics studies (25, 26). However, we also found that merging binning results from individual fraction assemblies and larger co-assemblies via MAG de-replication provided more medium- and high-quality MAGs than did co-assembly alone. We posit that this approach reaps the benefits of both strategies: it provides higher read recruitment for assembling rare genomes in co-assemblies and also leverages lower microdiversity in individual fraction assemblies. Optimal assembly strategies may differ for other environmental samples, and these strategies must be re-evaluated as sequencing and assembly methods evolve, but our results suggest that SIP metagenomic studies can benefit from employing multiple assembly approaches to maximize genome recovery.
Processing DNA-SIP samples is laborious, but semi-automated protocols simplify lab work and enable high-throughput SIP metagenomic studies (26). Indeed, increasing the number of biological replicates and sequencing more density fractions per replicate can improve the detection of labeled taxa (41). However, the opportunities for accidental mistakes, such as cross-contamination, sample mix-ups, or clerical errors, also increase when processing dozens of samples and hundreds of density fractions. In addition, slight mishandling of ultracentrifuge tubes can disturb delicate CsCl gradients (8) and potentially alter genome distributions along the gradient. If left undetected, these types of accidents could produce inaccurate weighted BD estimates, adding extra noise to the data analysis and even compromising results. In this study, we found that including pre-centrifugation spike-ins, which had distinct and predictable distribution patterns along the gradient, helped us identify and remove problematic samples before they negatively impacted our analyses. Including internal standards can mitigate potential errors and enhance the quality of large complex SIP studies with many replicates. Moreover, with careful design and additional development, internal standards might someday correct for variability introduced during sample processing (41) instead of simply flagging samples for removal. Internal standards can be easily incorporated into automated SIP metagenomics protocols (26), where they can improve the quality of SIP metagenomic results, and if adopted broadly, potentially serve as consistent fiducial reference points that facilitate inter-comparisons of different SIP studies.
Accurate measurements of genome abundance along the BD gradient are essential for identifying labeled genomes and determining their level of isotopic enrichment (11). However, the compositional nature of metagenomic data, and the variability introduced during sample processing and sequencing, can hamper quantitative estimates of genome abundance (27 - 30 - 49). Internal quantification standards can mitigate process variability and provide absolute abundance estimates of genes, transcripts, and genomes from metagenome and metatranscriptome data (30, 38, 50 - 53). Based on these findings, we hypothesized that adding internal standards to density fractions (sequins) could improve abundance measurements, thereby improving isotope enrichment measurements. Indeed, estimates of AFE in our study were more accurate using absolute abundances derived from sequin normalization compared with AFE estimates using other strategies.
Multiple factors could explain the more accurate estimates of isotopic labeling enabled by internal quantification standards. For one, sequins may have mitigated any variation introduced during library creation and sequencing (30). Additionally, sequins may have corrected for differences in DNA recovery among fractions that would have otherwise gone unnoticed and negatively impacted abundance measurements. That is, after collecting CsCl fractions, each fraction separately undergoes PEG precipitation and desalting before DNA concentrations are measured (26). Absolute abundances calculated using DNA concentrations assume identical DNA recovery efficiencies (24, 25), so any stochastic or systematic variability in the percent of DNA recovered would lead to errors in absolute abundance measurements. Conversely, sequins track and mitigate variability in DNA recovery when they are added to fractions before the desalting steps, as was performed here. Therefore, if DNA recovery efficiency varied among fractions, then we would expect absolute abundances derived from sequins to be more accurate than estimates derived from DNA concentration measurements. Without internal standards, variability introduced during DNA recovery, library construction, and sequencing is unknowingly propagated as noise into downstream SIP analyses. This undetected variability can potentially lead to errors that impact predictions of isotope enrichment.
The various SIP analysis methods examined in this study use different approaches to detect labeled microorganisms, and these differences could impact the sensitivity and specificity of their predictions. The accuracy of different SIP analysis methods has not to our knowledge been assessed with metagenomic data until now, but
Altogether, we used an environmental metagenome amended with isotopically labeled
MATERIALS AND METHODS
DNA collection and microbiome amendments
To create a microbiome where the identity of labeled genomes and their level of enrichment was known
DNA from a complex microbial community was recovered from an outdoor, man-made pond located at the Joint Genome Institute. Pond water was pre-filtered through a 5-µm mesh before collection onto 0.2-µm Supor filters (Pall; 47mm diameter). DNA was extracted from filters using a DNeasy PowerWater kit (Qiagen; 14900-50-NF).
Replicate samples were prepared for ultracentrifugation by combining 900 ng of microbiome DNA with 50 ng of
Synthetic pre-centrifugation DNA spike-ins
A set of six synthetic DNA fragments were added to mixtures of DNA from isolates and the complex microbiome to track the ultracentrifugation and fraction collection steps. These fragments were approximately 2 kbp in length with GC content of 37%–63% (Table S2 at https://doi.org/10.6084/m9.figshare.22280632). To change the distribution of fragments across the density gradient, some fragments were artificially enriched with 13C through PCR by adjusting the ratio of unlabeled dNTPs and uniformly labeled 13C dNTPs (Silantes Gmhb; 120106100; >98 atom%) (Table S2 at https://doi.org/10.6084/m9.figshare.22280632). Briefly, DNA was amplified for 30 cycles by adding 0.5-µL Phusion high-fidelity (HF) DNA polymerase (NEB; M0530S), 10 µL of 5× Phusion HF Buffer, 1 µL of 10 mM dNTPs (final conc. labeled/unlabeled mixture), 2.5 µL each 10 µM Forward and Reverse Primer, and 31.5 µL of nuclease-free water. PCR products were purified using AMPure XP beads (Beckman Coulter; 63880) and pooled in equimolar ratios to create a set of pre-centrifugation DNA spike-ins. These pre-centrifugation spike-ins were added at 1% by mass of the DNA mixture, e.g., 10 ng of synthetic fragment pool added to 1 µg of microbial DNA mixture.
Gradient separation, sequin addition, and fraction purification
Following Nuccio and colleagues (26), samples were centrifuged at 44,000 RPM (190,600 g) for 120 h at 20°C in a VTi 65.2 Rotor (Beckman Coulter; 362754). For each sample, 24 fractions of 220 µL were collected into a 96-well plate using an Agilent 1260 fraction collector running at flow rate 250 µL/min while using mineral oil as the displacement fluid. Fraction density was determined using a Reichert AR200 refractometer.
Before purifying DNA from CsCl fractions, an additional set of 80 synthetic DNA fragments, or
After sequin addition, DNA was recovered by adding a 250-µL solution of 36% polyethylene glycol (PEG) 6000 and 1.6 M NaCl to each fraction and incubating overnight in 4°C. Plates were centrifuged at 3,214 ×
Sequins were added to each fraction before PEG precipitation and DNA quantification steps; therefore, the amount added was based on the expected sample DNA concentrations. Tailoring sequin additions to actual sample DNA concentrations, as opposed to estimates, is preferable to ensure optimal coverage in sequencing data. After completing analysis of the amended microbiome, we sought to improve sequin additions for future studies by measuring DNA levels before PEG precipitation when DNA was still in concentrated CsCl. Additional details are provided in the Supplementary Information.
Library creation and sequencing
Sequencing libraries were generated from the 16 middle fractions of each sample using Nextera XT v2 chemistry (Illumina) with 12 PCR cycles. Concentrations and size distributions of each library were determined on a Fragment Analyzer (Agilent). Libraries were pooled at equal molar concentrations within the range of 400–800 bp, and the pool was size selected to 400–800 bp using a Pippin Prep 1.5% agarose, dye-free, internal marker gel cassette (Sage Science). For each library, 2 × 150 bp paired-end sequencing was performed on the Illumina Novaseq platform using S4 flowcells (Table S6 at https://doi.org/10.6084/m9.figshare.22280632).
Metagenome assembly and binning
Raw reads were filtered and trimmed using RQCFilter2 software according to the standard JGI procedures (https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/data-preprocessing/). Then, one of the four strategies was used to perform contigs assemblies: (i) an assembly of unfractionated SIP sample with metaSPAdes(v3.15.2) (56); (ii) a single fraction assembly with metaSPAdes (371 assemblies); (iii) a single sample co-assembly with metaSPAdes (co-assembly of all fractions sequenced for a single SIP replicate sample, 24 assemblies); and (iv) an experiment-wise co-assembly with MetaHipMer(v2.0.1.2) (assembly of all fractions across all replicates) (33). Assembly and genome mapping parameters are reported in the Supplementary Methods. We generated 397 assemblies in total. Quality assessment metrics for each assembly were calculated using QUAST(v5.0.2) (MetaQUAST mode) (Data Set S3 ) (57). Each assembly was then independently binned with MetaBAT(v2.12.1) (58). For each generated MAG, we used GTDB-Tk(v2.0.0) (GTDB R95) (59) to assign a taxonomic classification. To assess the quality of MAGs, we used CheckM(v1.1.3) (60) and QUAST(v5.0.2) (61). The MetaHipMer combined assembly was annotated using the JGI metagenome annotation workflow (58) and is available through IMG/
MAG deduplication and mean scaffold coverage calculations
Medium- and high-quality MAGs recovered from all assembly strategies were deduplicated to remove redundant versions of each draft genome (35). The genome-wide average nucleotide identity (gANI) and the alignment fraction (AF) were calculated for each possible MAG pairwise comparison (36). Next, the lowest pairwise values of gANI and AF were utilized for each MAG comparison, followed by clustering using single linkage to group MAGs based on species-level delineations (e.g., gANI ≥96.5 and AF ≥30) as defined by Varghese and colleagues (36). MAGs that did not cluster with other MAGs were considered singletons. Following clustering, we used completeness, contamination, and total length values to select a single representative MAG for each cluster. Sequences of all spike-ins and sequins were concatenated with the final set of MAG contigs, and this contig set was then used as a reference for read mapping across all density fractions (see Supplementary Methods). The average contig coverage of MAGs, spike-ins, and sequins in each fraction was calculated and used in the downstream analysis.
Quality control of SIP data using pre-centrifugation spike-ins
Before performing SIP analysis, we first removed mishandled samples from our dataset. For this purpose, we identified the peak of absolute concentration distributions across the density gradient for each labeled pre-centrifugation spike-in. If the spike-in distribution patterns did not match the expected order along the density based on the theoretical estimated density of the spike-in (given its GC content and C13/C12 ratio), then the sample was considered potentially problematic and removed from the analysis.
Estimating the absolute abundance of MAGs across density fractions
To determine the extent of isotope incorporation into genomes, it is first necessary to measure genome abundance across the density gradient. We explored several ways to measure genome abundance in the SIP dataset, which are implemented as part of the
First, we obtained absolute concentrations of genomes across the density gradient using the approach proposed by Hardwick and colleagues (30), in which sequins were used as internal reference standards to scale coverages into absolute concentrations. Briefly, the average MAG coverage within a given fraction (metagenome) was scaled into units of molarity using regression analysis based on known molarity of 80 sequins and their average coverages. Molar concentrations of the sequins in the added standard mixture were obtained from the manufacturer (Garvan Institute of Medical Research). For regression analyses, we first tested both ordinary least squares regression and robust linear regression. When using ordinary least squares regression, we also tested Cook’s distance filtering to remove outliers at a threshold of Cook’s distance <
In addition to sequin-based normalization, we also explored genome abundance estimation using: (i) unscaled coverage; (ii) relative coverage; and (iii) absolute abundance as per the approach of Greenlon and colleagues (25); and as per the approach of Starr and colleagues (24). Unscaled coverages represented raw average MAG coverage values that were directly used in the estimation of mean weighted BDs and AFE. Relative coverage was estimated as follows: (coverage of an MAG within a fraction)/(sum of coverages of all MAGs within a fraction).
Estimation of atom fraction excess of MAGs
The qSIP model (Equation 1) or ΔBD model (Equation 6) can be used to estimate the AFE of genomes. Briefly, the AFE of organism
(Equation 1-A)
(Equation 1-B)
where
(Equation 2)
(Equation 3)
(Equation 4)
where
The weighted average buoyant density (Wij) is then estimated as
(Equation 5)
where
The estimation of AFE based on the ΔBD model can be represented as
(Equation 6)
where
Identifying isotope incorporators using HR-SIP and MW-HR-SIP
To run the high-resolution SIP (HR-SIP) and moving-window HR-SIP (MW-HR-SIP) methods, we used the MAG abundances obtained from the sequin normalization approach. Differential abundances based on absolute abundance for MAGs in the heavy fractions in the treatment conditions were compared to control conditions using HR-SIP and MW-HR-SIP using the HTSSIP R package (31). For HR-SIP, a heavy BD window was set from 1.71 g/mL (as the theoretical peak of
Subsampling of
Reads that mapped to
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2023 Vyshenska et al. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
ABSTRACT
Stable isotope probing (SIP) facilitates culture-independent identification of active microbial populations within complex ecosystems through isotopic enrichment of nucleic acids. Many DNA-SIP studies rely on 16S rRNA gene sequences to identify active taxa, but connecting these sequences to specific bacterial genomes is often challenging. Here, we describe a standardized laboratory and analysis framework to quantify isotopic enrichment on a per-genome basis using shotgun metagenomics instead of 16S rRNA gene sequencing. To develop this framework, we explored various sample processing and analysis approaches using a designed microbiome where the identity of labeled genomes and their level of isotopic enrichment were experimentally controlled. With this ground truth dataset, we empirically assessed the accuracy of different analytical models for identifying active taxa and examined how sequencing depth impacts the detection of isotopically labeled genomes. We also demonstrate that using synthetic DNA internal standards to measure absolute genome abundances in SIP density fractions improves estimates of isotopic enrichment. In addition, our study illustrates the utility of internal standards to reveal anomalies in sample handling that could negatively impact SIP metagenomic analyses if left undetected. Finally, we present
IMPORTANCE
Answering the questions,
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer