Untargeted Exometabolomics Provides a Powerful

Full text

Turn on search term navigation

1. Introduction

The Arctic is warming at least twice as fast as any other landscape on the planet and stores nearly half the Earth’s terrestrial carbon (C) in soil organic matter (SOM) associated with permafrost soils [1,2,3]. Rising temperatures have accelerated permafrost degradation, resulting in physical, hydrological, and chemical shifts across the landscape that have led to previously frozen SOM suddenly becoming available for microbial decomposition [4,5,6]. Mobilizing even a fraction of this C-rich SOM via these physical and biochemical processes may accelerate the release of greenhouse gases (GHGs) (i.e., CO₂, CH₄, N₂O) from the landscape, creating a significant positive feedback to climate change [7,8,9,10]. Reliably predicting where (hotspots) or when (hot moments) this C loss is most likely to occur across the Arctic landscape, however, depends on multiple interacting factors. These include landscape heterogeneity [11,12,13] and the associated shifts in hydrology (topography) [14,15], vegetation [16,17], microbial community composition [18]. Furthermore, the chemical composition of SOM, or its inherent availability to soil microbial communities, varies strongly with landscape heterogeneity [19,20,21], and this molecular-scale information is currently poorly understood and poorly characterized in process-based models [22,23,24].

Polygonal tundra, a primary landscape type in Arctic systems, is often dominated by characteristic features called ice-wedge polygons that form when freeze-thaw cycles physically move the soil. This creates a unique and visible microtopography across the landscape that has been shown to strongly influence hydrology, vegetation, and microbial community structure [16,18,25]. There are different polygon types including low-centered polygons (LCP) that have a topographically low and generally wet center and high-centered polygons (HCP) that have topographically higher and dryer centers (Figure 1) [25]. Typically ~5–20 m in diameter, these two endpoints of a gradient of topographic variation can act as distinct, repeatable units whose metabolomic composition can be used for studies involving scaling measurements and initializing terrestrial model integrations [26]. Recently, numerous studies have reported a strong relationship between polygon type, vegetation, and biogeochemistry (i.e., inorganic ions, pH, redox potential, bulk C/N) [14,15,27,28], especially in the organic-rich active layer [29].

Low molecular weight (LMW) dissolved organic matter (DOM) is the water-soluble fraction of SOM most available to microbial decomposers and is a complex and dynamic mixture of small molecules (<1500 Da) of both biotic and abiotic origin (i.e., plant root exudates, products of microbial metabolism or turnover, photodegradation products) [30]. In laboratory incubations of Arctic soils, LMW DOM composition has been shown to be sensitive to variations in both temperature [31,32,33] and moisture [34]. Analogously, the structure and function of soil microbial communities have been shown to be strongly influenced by the molecular composition of this highly labile substrate pool [35,36,37]. Despite it representing an information-rich chemical fingerprint of biological function in soil and, thus, a potential indicator of C vulnerability across space that could help reduce uncertainty in process-based predictive models of C cycling [23,24], the spatial variability of LMW DOM composition across Arctic polygonal tundra landscapes is largely unknown.

Some of this uncertainty stems from analytical challenges in LMW DOM detection, isolation, and quantitation. While there is not a single analytical platform that can characterize all LMW DOM species in a given soil sample, liquid chromatography (LC) coupled with high-resolution mass spectrometry (HRMS) and a soft ionization technique such as electrospray ionization (ESI) is a powerful approach that allows for the detection and relative quantitation of thousands of molecular species in a single measurement [38,39,40,41,42]. Here, we use two complementary LC phases, reversed-phased (RP) and hydrophilic interaction chromatography (HILIC), and both MS-polarities (positive- and negative-ion mode) at the nano-scale, as this was recently shown to be an effective approach to detect and resolve an expanded range of LMW DOM in soil [43]. We applied this untargeted approach in soil organic horizons with contrasting aboveground vegetation community composition, obtained from the centers of a high- and low-centered polygon, to elucidate the molecular distribution and differential abundances of LMW DOM across space and yield new insights into the diversity of organic substrates available to plant and microbial communities in these polygonal tundra systems.

2. Materials and Methods

2.1. Study Site and Sample Description

A detailed description of the study site, soil cores, and sample collection can be found in the Supporting Information. In brief, four soil cores were collected from a polygonal tundra landscape on the northern coastal plain of Alaska on the Barrow Environmental Observatory. A total of two cores were collected from the organic soil horizon to a depth of 15 cm from the center of an LCP and two from the center of an HCP. The aboveground vegetation at these locations varied by polygonal type [44]. In order to assess differences in LMW DOM potentially associated with aboveground vegetation, in each polygonal center, we took one soil core from an area dominated by sedge Carex aquatilis and one soil core from an area dominated by sedge Eriophorum angustifolium. The cores were shipped frozen to Oak Ridge National Laboratory (ORNL, Oak Ridge, TN, USA) where they were stored at −80 °C until processing. Using a band saw, each frozen core was cut into three 5-cm sections (top, middle, and bottom) to enable the evaluation of any within-horizon variation. The sections were thawed overnight at 4 °C and then homogenized by hand removing all inorganic, mineral, or live plant material [45]. Live roots were also removed from each section, dried, and weighed to evaluate any correlation between LMW DOM abundance and root biomass. A subsample from each of the core sections (n = 12) was taken for determination of water content, total carbon and nitrogen, and total organic carbon (Table S1) using conventional techniques, which are described in detail in the Supporting Information.

2.2. Soil Extraction and Sample Preparation

Biological replicates were obtained by extracting each core section in triplicate (n = 36, nine per core), along with three controls (extraction with no soil), using the procedure we previously optimized for these soils [43]. Briefly, a single aqueous extraction (LC/MS-grade H₂O, pH = 5, 1:3 w/v, 1 h) was employed to maintain high throughput and extract compounds that would be found free in the soil solution and bioavailable to both plant and microbial communities [38,46]. A pooled quality-control (QC) sample, consisting of equal volumes of all 36 samples and an internal standard, 4-amino-6-methyl-8-(2′-deoxy-β-D-ribofuranosyl)-7(8H)-pteridone (6-MAP), was also prepared to monitor instrument performance and assist with normalization procedures in post-processing [47]. Extracts were centrifugal filtered (Amicon Ultra, 3 kDa, 4 °C, 15 min), concentrated (12×) using a SpeedVac Concentrator, and then separated into two aliquots. One aliquot was further evaporated to near-dryness and brought back up in 95:5 (v/v) acetonitrile:water, creating one organic and one aqueous aliquot per sample for analysis by HILIC and RP-LC, respectively. Extracts were stored at −80 °C until LC/MS analysis.

2.3. Instrumentation and LC/MS Data Collection

Samples and controls were thawed and prepared immediately prior to injection. Separations were performed using a Dionex UltiMate 3000 HPLC pump (ThermoFisher Scientific, Waltham, MA, USA), and 100 µm i.d. fused-silica (Polymicro Technologies, Phoenix, AZ, USA) columns laser-pulled and pressure-packed in-house to 20 cm with either Kinetex C18 material (5 µm, 100 Å, Phenomenex, Torrance, CA, USA) or zwitterionic, polymer-based ZIC-pHILIC material (5 µm, Sequant, bulk material provided by EMD Millipore (Burlington, MA, USA). Optimized mobile phase compositions (Table S2) and gradients (Table S3) can be found in the Supporting Information. Quality controls were run every six injections and samples were randomized to reduce instrument-derived variation. Technical blanks representing the column re-equilibration conditions were also run regularly to monitor background ions and carry-over between samples.

To obtain high-mass accuracy measurements, columns were coupled to an LTQ-Orbitrap Velos Pro mass spectrometer (ThermoFisher Scientific) equipped with a nano-electrospray ionization source (Proxeon, Syddanmark, Denmark) operated in either positive- or negative-ionization mode under direct control of the XCalibur software, v2.2 SP1.48 (ThermoFisher Scientific) resulting in four separate LC/MS analyses per sample (HILIC +/− and RP +/−, n = 144). Full precursor (MS¹) scans were acquired over a mass range of 50–1500 m/z in centroid mode at a resolving power of 30,000. Fragmentation spectra (MS²) were also collected to provide an additional dimension for annotation by database matching or eliminating candidates for features that match to multiple hits. Additional details about chemicals, instrumentation settings, calibration, and standards, can be found in the Supporting Information.

2.4. Untargeted LC/MS Data Processing

Raw LC/MS files were processed using the freely available MZmine (v2.30) software [48,49]. Detailed descriptions of each of the modules and parameters used for peak detection (Table S4), chromatogram alignment, peak list generation, and annotation (Table S5) are in the Supporting Information. Briefly, chromatograms were built using high-resolution MS¹ and MS² data and retention time (RT) within a specified m/z and RT tolerance, resulting in an assigned peak area. All chromatograms within each LC/MS condition were aligned across the sample set (including blanks and controls) to assist with normalization and relative quantitation. Although a soft-ionization technique, electrospray ionization can create in-source fragments, adducts, or ion complexes that can complicate spectral analysis and annotation. Using the identification module in MZmine, each spectrum was searched for adducts, complexes, and fragments using specified RT and m/z thresholds. The proportion of each LC/MS dataset identified as either adducts, complexes, or fragments did not exceed ~10% (Figure S1).

2.5. Data Filtering, Normalization, and Statistical Analyses

To evaluate the ability of the LC/MS approach to detect quantitative variations in LMW DOM abundance across space, in addition to peak detection and alignment, it is also important to remove as much noise, background signal, and unwanted variation as possible. To accomplish this, multiple conservative LC/MS-based metabolomic data processing techniques were applied, including normalization procedures, data transformations to remove intragroup batch effects (Figure S2), a blank/control correction, and reproducibility and abundance thresholds [40,41,50]. This resulted in a matrix of features—defined here as a unique RT, MS¹ m/z, and MS² fragmentation spectrum with a corresponding peak height (intensity) and a peak area. Any duplicate features or features that had zero peak area after normalization were also removed, resulting in a matrix of high-quality features (HQFs). The number and complexity of HQFs detected by each LC/MS condition were used to evaluate LMW DOM coverage, measurement depth, and the qualitative and quantitative reproducibility across samples by comparing the accurate mass of the corresponding [M+H]⁺ or [M-H]⁻ molecular ion and the peak area for each feature. Only the HQFs that were observed in at least three replicates per core were carried on to subsequent quantitative analyses. This step helps reduce the probability of false positives and creates a more conservative list of only the most reproducible and abundant HQFs to be compared between samples. Missing values were imputed for statistical analyses and various univariate and multivariate statistical analyses (e.g., Students t-test, ANOVA, PCA), and data visualization techniques (e.g., volcano plots, heat maps) were used to help identify clusters of features that were consistently and significantly different between samples for annotation and to examine the relative abundance differences between extraction replicates and polygon or vegetation types. Additional details about normalization procedures, selected statistical tests, and thresholds can be found in the Supporting Information.

2.6. Feature Annotation

Annotation of features that were consistently observed and significantly differentially abundant due to polygon type or vegetation was carried out in a three-step procedure described in detail in the Supporting Information. In brief, features were compared against a series of databases and assigned putative chemical formulas using high mass accuracy measurements, Kind and Fiehn’s “Seven Golden Rules”, and parameters modified from Kujawinski and Behn’s compound identification algorithm (CIA) for small molecules [51,52,53]. Databases included KEGG [54], METLIN [55], MMCD [56], PubChem [57], HMDB [58], LipidMaps [59], or Plant Cyc [60]. Features that matched to multiple hits in a database or multiple formulas were manually scrutinized in an iterative approach by assessing high-resolution mass spectral data for consistent fragmentation profiles.

3. Results

Given that this was the first application of this untargeted exometabolomics technique across multiple Arctic sampling sites, we began with an evaluation of the analytical performance of the approach. Due to the large number of samples and data generated, we then applied a series of data mining techniques and multivariate statistical analyses to reduce the dimensionality of the data in order to discriminate ecologically relevant features and compositional variations due to polygon type or vegetation. Features that were significantly and reproducibly differentially abundant between sites were further investigated and putatively annotated using high-mass accuracy MS data and database searching.

3.1. Evaluation of Analytical Performance Across Multiple Sites

Given that a detailed analysis of the analytical performance of this untargeted LC/MS approach in Arctic soils was conducted previously [43], only a few primary figures of merit—measurement depth, reproducibility, and LMW DOM coverage—were examined here. All data processing, filtering steps, and statistical analyses were conducted separately for each LC/MS condition (HILIC +/−, RP +/−) to eliminate any confounding effects such as different ionization efficiencies or noise levels for example. Across the four conditions, 13,673 molecular species (RT, MS¹, and MS²) were detected, aligned, and exported for data filtering and analysis (Table 1).

A preliminary PCA for each LC/MS dataset, prior to any filtering, normalization, or statistical procedures, revealed a clear separation between blanks, controls, and samples (Figure S3), indicating that variations observed between samples were not experimentally derived but instead due to biogeochemical variation. However, ~18% of the aligned peaks were observed in nearly all the runs including the blanks and controls (Figure S4a), suggesting these were background signals from the sample preparation procedures or LC/MS analyses. After removing these, as well as any zeros or duplicate features, 11,258 HQFs remained for downstream analyses. When we plotted the frequency at which the HQFs were observed across the dataset, we noted a recurrent trend in the data where the number of features observed increased sharply at approximate intervals of nine (Figure S4b), corresponding with the sample set size for each core (9 extracts). These results indicate that the data filtering protocol employed here effectively reduces the false discovery rate (FDR) and increases the proportion of LMW DOM analytes represented.

These results also suggest that a common set of LMW DOM features exists within each core, and across all four cores, despite variations in aboveground vegetation or topography. Indeed, when we examined the overlap between the four cores for each LC/MS condition using the neutral mass for each [M+H]⁺ or [M-H]⁻ singly charged precursor ion within 0.005 Da, on average there was a 37% overlap in the features detected (Figure S5). Contrastingly, on average, 15.5% of the features detected were found to be unique to each core indicating there was unique biogeochemical activity within each core as well, and that the optimized LC/MS approach employed here was sensitive enough to detect these subtle variations between sites.

To examine the variability of LMW DOM across each of the LC/MS conditions prior to data filtering and normalization and evaluate the reproducibility of the untargeted approach across biological replicates, we built a correlation matrix using the calculated Pearson coefficients for each extract (Figure 2) and PCA plots using the unique identifiers and peak areas for each HQF (Figure S7). While there was some variability among replicates, which was more noticeable in the RP datasets, in general, there was a fair amount of correlation across each of the nine samples within each core. Interestingly, while it may be expected that aboveground vegetation dictates belowground SOM composition, the cores from the same polygon type were more highly correlated to one another than the cores with the same aboveground vegetation (Figure 2). The PCA components accounted for 58, 51, 54, and 61% of the variation across the HILIC (+), HILIC (−), RP (+), and RP (−) datasets, respectively, indicating both polygon type and vegetation have a major effect on the LMW DOM composition.

3.2. Impacts of Polygon Type and Vegetation on LMW DOM Abundance

To reduce the dimensionality of these data and identify features that were significantly differentially abundant between cores, we performed pairwise comparisons by t-test and fold change analysis between cores of the same polygon or vegetation type, followed by an ANOVA to determine features that were in higher relative-abundance uniquely due to polygon type or vegetation (p-value < 0.001, FC > 4). To visualize these differentially abundant features, we first used volcano plots to isolate the features that had the greatest FC and lowest p-value between conditions (Figure 3). There were more features found in higher relative abundance in the Eriophorum cores versus the Carex cores and at the HCP sites versus the LCP sites.

To then evaluate the quantitative reproducibility of the differentially abundant features, an analysis of the coefficient of variance (CV%) for the peak areas across replicates revealed that 95% of these differentially abundant features showed acceptable reproducibility (CV < 10%, Figure S8) indicating the optimized data collection and processing techniques were robust and that the data filtering protocols were conservative, selecting for LMW DOM features that were consistently detected across replicates. It is important to note that some variability observed among replicates is not unexpected. Despite the subsamples of soil being relatively small (4 g), it has been well-established that LMW DOM composition and abundance can vary at even the micro-site or aggregate scale (10 s–100 s if µm) [61,62]. That the untargeted approach applied here can detect these subtle differences is an added benefit, as it demonstrates the sensitivity of the technique to detecting variation in the abundance of LMW DOM across space and capturing both the biotic and abiotic impacts on this pool.

3.3. Molecular Characterization of Differentially Abundant LMW DOM Features

We further investigated the relationship between polygon type or vegetation and LMW DOM abundance by directly contrasting the differentially abundant LMW DOM features using molecular data obtained from the high-resolution LC/MS measurements. Differentially abundant features ranged in molecular weight (~56–900 m/z) and polarity, exhibited by their elution across the full retention time window for each LC/MS condition (Figure S9). The m/z distribution did not vary appreciably between cores (Figure S10) or between the operationally defined depth increments we employed (Figure S11) as has been found in previous work [43]. These data support that the LC/MS conditions were not biased toward any particular class of compounds and that molecular weight alone is not adequate at differentiating LMW DOM abundance across space and that additional molecular information is required.

Of the 521 differentially abundant features, 217 (42%) were assigned molecular formulas while 304 (58%) did not meet the criteria for a confident assignment or were possible adducts, complexes, or fragments identified by the MZmine modules during annotation. As described above, we estimate approximately 10% may have been adducts; including sodium (Na⁺) or chloride (Cl⁻) adducts as these are commonly seen in the characterization of OM using positive- and negative-ESI, respectively [63,64]. Additionally, these soils have been shown to have high iron concentrations [34,65,66,67], and since organo-iron complexes can be soluble in soil water, they may have been extracted here as part of the LMW DOM pool. Because organo-metal complexes generally dissociate upon ionization, however, they would not appear in the mass spectrum, or would appear as an ion ([M-Fe+H]⁺) less the mass of iron (55.9349 m/z) requiring a manual search of the data to annotate these. Across the 217 assigned features, the average mass error was 0.65 ppm and the average molecular weight was 379.9353 m/z [68].

Elemental data from formula assignments were then used to assign a biomolecular compound class—lipids, proteins (amino acids and amino sugars), lignins, carbohydrates, unsaturated hydrocarbons, condensed aromatics (phenolics), tannins, and aliphatics—to each differentially abundant feature based on their H/C and O/C ratios (Figure 4) in a van Krevelen plot [69,70]. There is a clear density of formulas in the low O/C and high H/C regions of the plot, indicating an abundance of aliphatic compounds—such as lipids, sugars, and amino acids—possibly derived from microbial biomass or root exudates and plant leachates. The high presence of formulas consistent with phenolics, lignins, and proteinaceous (i.e., peptides, amino sugars) material is indicative of microbially digested lignocellulosic biomass or freshly deposited plant material.

New methods have recently been proposed to improve biomolecular assignment of molecules from ecological samples, for example, by including N and P as well [71]. Since N-containing compounds made up over 70% of the differentially abundant features detected at each polygon and are vulnerable to microbial degradation, here, we have also included a van Krevelen analysis between the two polygon types using the N/C ratio (Figure 5); though this technique may also be used with the S/C or P/C ratios to visualize the distribution of heteroatoms across LMW DOM features detected. Using this approach, the results show a separation between the N-containing features at the LCP and HCP sites. More features with a low N/C ratio (N/C < 0.2) and high H/C content (H/C > 1.5), which are consistent with lipid-like compounds, were found at the HCP site, consistent with our findings from above. Features in the region N/C < 0.1 and H/C < 1.5, indicating a high number of amino groups and the presence of phytochemicals (bioactive plant compounds) [71], were similar between the two polygon types. Features with higher N/C > 0.2, consistent with LMW DOM compounds having secondary or tertiary amines (i.e., alkaloids, cyclic amines), were more dominant at the LCP site.

For a more detailed view of the LMW DOM chemistry at these sites, the average molecular properties for the differentially abundant features that were assigned formulas have been reported (Table 2). Due to polygon type being a stronger predictor of LMW DOM abundance, features that were in higher relative abundance at either the HCP or LCP sites have been highlighted. A selection of LMW DOM features that had the highest fold change between sites have been further characterized in Table 3.

In addition to the volcano plots, another more detailed way we examined the differentially abundant features was to distinguish clusters of features that vary similarly across cores using two-way hierarchical clustering with the normalized log₂ peak areas and a unique identifier for each feature (Figure 6). This allows for the visualization of LMW DOM features that were consistently and similarly varying across space indicating hotspots of biogeochemical activity. Consistent with analyses described above, the cores clustered into two main groupings corresponding to polygon type, LCP or HCP, samples 1–9 with 19–27 and samples 10–18 with 28–36, respectively. A total of four clusters have been highlighted to show the subtle, but consistent and significantly different variations between cores due to polygon type, vegetation, or in some cases, depth. For example, cluster 1 shows 76 features that are more abundant across all of the cores except for the Eriophorum core at the LCP site where those features were found in lower relative abundance. Cluster 2 shows 71 features that were depleted in both LCP cores but not the HCP cores. Cluster 3 indicates that 67 features were depleted in the Eriophorum core at the HCP site, and cluster 4 shows 44 features that were in higher relative abundance in the LCP cores, but that this varied with depth in the Carex core at that site.

In cluster 1, of the 75 differentially abundant features, 52 (69%) were assigned a chemical formula (average mass error = 0.406 ppm) based on high mass accuracy MS¹ measurements, and 4 others that were not assigned a chemical formula but did match to a database (<5 ppm), for a total of 56 (75%) features annotated in the cluster (Table S6). When a compound was annotated by both elemental formula assignment and database matching, most of the time the formulas matched. However, there were instances where different formulas were assigned to the same molecule, which occurred twice in cluster 1, indicated by the asterisks in Table S6. In these cases, we were able to use MS² fragmentation data to match to available data or eliminate incorrect assignments, highlighting the value of MS² data in providing information about both known (already in a database) and unknown compounds, and adducts or complexes. As an example, in the case of the [M-H]⁻ ion detected at 192.0527 m/z, characteristic neutral losses of formamide (-CH₃NO, 45.0214 Da) and multiple dehydrations (-H₂O, 18.0098 Da) were observed (Figure S12), indicating a structure consistent with glucuronamide, a monosaccharide derivative of beta-D-glucuronic acid, a common microbial metabolite involved in ascorbic acid synthesis [72].

It is important to note that although no formula or database match was made for the compounds at the bottom of Table S6, each of those features were reproducibly (n > 3) and reliably (S/N > 3) detected, and were robustly and conservatively determined to be significantly (p < 0.001) differentially abundant between samples. In addition, each feature has a reproducible RT and peak area, and both MS¹ and MS² high-mass accuracy measurements. This demonstrates the value of this information-rich signal and shows how it can be used diagnostically for both qualitative and quantitative inquiries.

4. Discussion

While the total number of features varied some between soil cores, overall, HILIC (+) detected the greatest number of HQFs across the four cores with 3929 (34.9%) followed by RP (+) with 3618 (32.1%), HILIC (−) with 2170 (19.3%), and finally RP (−) with 1541 (13.7%) (Table 1). This is likely due to the more favorable ionization conditions in positive-mode, and more reproducible retention on the HILIC columns for the small, highly polar compounds that dominate LMW DOM [73]. Despite some differences in performance, the optimized LC/MS conditions were still highly orthogonal with just 94 (2%) HQFs observed by all four conditions (Figure S6). These results confirm that the dual-LC, dual-polarity approach is effective at expanding coverage of the LMW DOM pool and is sensitive enough to capture both shared features as well as those unique to Arctic soils obtained from different sampling sites with varying aboveground characteristics.

Soil cores from the same polygon type were more highly correlated to one another than the cores collected in areas of each polygon type that were dominated by the same aboveground vegetation (Figure 2). This suggests polygon type (topography) is a stronger predictor of LMW DOM abundance than vegetation cover at this scale and that it may be a useful scaling parameter to connect biogeochemical measurements with landscape properties (i.e., thaw depth, hydrology) [18,26,29].

The lower abundance of LMW DOM at the LCP site (Figure 3) may be due to increased transport (horizontal or vertical) of LMW DOM out of the organic horizon [74,75], likely due to the lower topography and more saturated conditions, or increased microbial processing. The higher relative abundance of LMW DOM features in the Eriophorum cores suggests either an accumulation or an increase in the rate at which LMW DOM is made available, possibly due to higher root biomass (Table S1), which has been shown to enhance substrate availability [76]. Alternatively, this could indicate a depletion of LMW DOM in the Carex cores, which could be due to increased microbial processing (biogeochemical hotspot), or plant uptake, of DOM at those sites. Plant uptake of DOM has been observed in Arctic vegetation as a way for plants to overcome nitrogen limitation [77,78,79]. Further analysis of the molecular details or additional studies that include isotopic labeling or gas flux measurements could be conducted to verify increased mineralization of LMW DOM at the LCP and Carex sites.

As both Carex and Eriophorum are vascular plant species and decomposition is generally slowed in Arctic systems, an accumulation of lignified LMW DOM across the cores was expected (Figure 4). Most of the differentially abundant features between cores were more abundant in the Eriophorum cores and were dominated by formulas consistent with low O/C and high H/C (i.e., aliphatic) content. This may have been due to the higher root biomass present in those samples (Table S1) or, more likely, necromass found beneath the Eriophorum cores [80,81].

In contrasting the molecular details of the two polygon types, there were readily observable differences reflected in the LMW DOM pool. Consistent with our hypothesis above from the van Krevelen analysis, there are multiple lines of evidence to support a hotspot of increased microbial processing and C cycling at the LCP site. First, both the average m/z and double bond equivalents (DBE) were lower in the LCP cores, characteristic of SOM that has undergone microbial decomposition (Table 2). Second, the proportion of features characteristic of compounds with higher biodegradability—lipids, carbohydrates, aliphatics—were also lower in the LCP cores (Figure 4), suggesting they may have been preferentially degraded and released as GHGs [31,32,82]. Third, there was a higher relative abundance of tannins and other condensed aromatics at the LCP site, as shown in Figure 4 and by the higher aromaticity index (AI) and 25.3% tannin content as shown in Table 2, suggesting an accumulation of these more recalcitrant features. Finally, although LCP centers are generally more anaerobic due to saturated conditions, the average oxygen content (demonstrated by the average molecular formula, O/C, and O/S ratios) of the differentially abundant LMW DOM features at the LCP was higher than the HCP (Table 2), further supporting enhanced microbial processing of OM at the LCP site.

Additionally of note, although there were a similar number of chemical formulas detected in higher relative abundance at each polygon, the features at the HCP site were more chemically diverse as indicated by a more equitable distribution among the assigned compound classes (Table 2). One explanation for this is that although the aboveground vegetation in each core represented primarily a single species, the HCPs generally have higher plant diversity [44]. This has been associated with more diverse plant inputs into the soil and increased microbial diversity, in turn leading to a more diverse substrate pool [83].

Interestingly, of the assigned formulas at the HCP and LCP sites, 88% and 72%, respectively, contained N, suggesting root exudation of organic N and/or microbial processing of SOM may be an important process occurring at these sites, especially in the HCP cores. This could be a result of the priming effect. Because HCP polygons are drier [84], plant and microbial activity may be more limited. As such, vegetation may allocate more N belowground to try and stimulate microbial processing of organic matter to release nutrients for uptake [85,86]. Somewhat surprisingly, ~11% of the differentially abundant formulas across all four cores contained both sulfate and nitrate groups (O > 6), which are characteristic of secondary organic aerosols (SOAs) [87]. Secondary organic aerosols are formed in the atmosphere through a complex interaction of sunlight and volatile organic compounds that originate from industrial emissions, cars, burning biomass, or even vegetation [88]. They have been shown to be an important input of organic C to alpine systems where they influence a range of biogeochemical processes [89]. However, while they have been observed near Utqiaġvik before, it has generally been along the coastline or in the marine environment closest to anthropogenic activities [90]. To the best of our knowledge, this is the first evidence of water-soluble SOAs in polygonal tundra soils on the BEO. These results suggest that some portion of LMW DOM that is available for microbial processing or plant uptake is derived from volatile organic carbon precursors.

Among the LMW DOM features annotated in Cluster 1 by database matching, there were amino acids, plant hormones, microbial metabolites, lignin-like molecules, and DNA/RNA fragments/derivatives (Table S6). These data support that this approach can detect key compounds involved in biogeochemical cycling. For example, a urea derivative (N-hydroxymethyl urea, [M-H]⁻ detected at 89.0358 m/z), was found to be in higher relative abundance in every core except the Eriophorum core at the HCP site. As a key metabolite in N cycling (i.e., ornithine cycle), urea is produced/excreted when there is an accumulation of highly toxic ammonia. An accumulation of extracellular urea in these soils may suggest increased inorganic N availability. This example demonstrates the utility of this untargeted approach for targeting additional compounds of interest; here, those involved in the urea cycle (i.e., glutamate, glutamine, arginine, citrulline) and in elucidating ecologically relevant molecular information to be used in mechanistic modeling.

In conclusion, this study implemented a dual-LC, dual-polarity nano-LC/MS approach to examine the variation in LMW DOM relative abundance in soil cores with two contrasting aboveground vegetation profiles and polygon types. Taken together, these results reveal a complex picture of C and N cycling at these sites, yielding insights into the chemical processing and relative degradability of the LMW DOM features found across the Arctic landscape. These results support that a broad range of compounds with varying physicochemical properties and concentrations were detected by the optimized approach and that the untargeted platform is sensitive, robust, and reproducible even when applied across multiple cores from different sites across the landscape. We provide evidence that LMW DOM is a diverse and reactive pool, and while there were a common set of metabolites among the cores, there were significant differences observed between sites as well indicating LMW DOM may be an important indicator of biogeochemical variation across the landscape. In addition, the untargeted LC/MS approach was sensitive to variation at multiple scales. While polygon type was a strong predicter of LMW DOM composition and relative abundance, vegetation also had an impact, indicating LMW DOM provides a window into the dynamic and complex interactions between landscape topography, vegetation, and SOM cycling.

Furthermore, this study revealed evidence of enhanced microbial processing at the LCP and Carex sites demonstrating its ability to detect hotspots of biogeochemical activity across space. Of the 521 differentially abundant features detected, 217 were putatively annotated by formula assignment, database matching, and evaluating the fragmentation data. For some compounds, this is the first time they have been reported in Arctic soils, including the 11% of detected formulas consistent with secondary organic aerosols, although additional studies are needed to understand the relative importance of this process in these systems. With an average mass error of <1 ppm, these high-mass accuracy measurements combined with reproducible retention times and peak areas provide an information-rich chemical profile of LMW DOM features in polygonal tundra soils. Correlating these qualitative and quantitative variations with additional landscape-scale features (i.e., hydrology, gas fluxes) would yield additional insight into how this chemical signal may be used to predict various processes impacting C cycling in the Arctic.

Supplementary Materials

The following are available online at https://www.mdpi.com/2571-8789/5/1/10/s1: Table S1: Polygon soil core sample summary and corresponding TOC, TN, TC, C:N, and dry root weight results. Table S2: Optimized mobile phase conditions and additives for each LC phase and MS polarity, injection volume, and flow rates. Table S3: Optimized gradient conditions for nano-LC separations, for positive- and negative-MS modes on C18-RP and ZIC-pHILIC columns. Table S4: MZmine parameters used for the peak detection modules applied in the analysis of the polygonal tundra soil organic horizons. Table S5: MZmine parameters used for the peak list generation modules applied in the analysis of the polygonal tundra soil organic horizons. Table S6: “Cluster 1”, listing the differentially abundant LMW DOM features found in lower relative abundance in the Eriophorum—HCP core. Figure S1: Percent aligned peaks that were annotated my MZmine as an adduct, complex, or fragment of another peak within 10 ppm and +/− 0.1 min RT for each core grouped by LC/MS condition. C = Carex, E = Eriophorum, A = Site A or Low-Centered Polygon (LCP), and B = Site B or High-Centered Polygon (HCP). Figure S2: Box plots of raw log₂ abundance values for HILIC (−) dataset which had a systematic shift in quantitative values (a) due to experimental variation (i.e., instrumentation, ionization efficiency, extraction efficiency). Plot (b) shows how the normalization procedure removes this systematic error. Figure S3: PCA plots of raw log₂ abundance values from the blank, controls, and samples before normalization, imputation, and filtering procedures. Figure S4: (a) Histogram of the frequency of observations for each aligned peak (RT, MS¹, and MS²) across the entire dataset (all 4 cores), including blanks and controls (55 total runs), before any data quality filtering steps and (b) a histogram of the HQFs across the 36 samples, after filtering out zeros, duplicates, and signals that were observed in the blanks or controls. Figure S5: Venn diagrams showing the overlap of features between cores for each LC/MS condition. C = Carex aquatilis, E = Eriophorum angustifolium, A = Site A or low-centered polygon (LCP), and B = Site B or high-centered polygon (HCP). Figure S6: Venn diagram (top) showing the overlap of HQFs between LC/MS conditions across all four cores obtained by calculating the neutral molecule from the MS¹ data, [M+H]⁺ and [M-H]⁻ ions (+/− 0.005 Da), excluding isomers and isobars. Bar graphs (bottom) show the total number of unique HQFs observed by each LC/MS condition and the number of LMW DOM features that were observed only once or multiple times across the four conditions. Figure S7: PCAs of HQFs by LC/MS condition. Figure S8: Pie chart with the results from a coefficient of variance analysis for peak areas of the differentially abundant LMW DOM features. Figure S9: Distribution of molecular weight (m/z) and retention time (RT) for differentially abundant features, detected across the 36 extracts, separated by LC/MS condition. Figure S10: Distribution of molecular weight (m/z) against fold change for LMW DOM features that were differentially abundant due to polygon (E-HCP_E-LCP and C-HCP_C-LCP) or vegetation type (C-HCP_E-HCP and C-LCP_E-LCP). Figure S11: Distribution of m/z by core and depth; solid color = top, stripes = middle, dots = bottom, C = Carex aquatilis, E = Eriophorum angustifolium. Figure S12: Fragmentation spectrum of [M-H]⁻ ion at 192.0527 m/z showing characteristic neutral losses used for putative annotation.

Author Contributions

Conceptualization, M.P.L., S.D.W. and R.L.H.; methodology, M.P.L., D.T.R. and R.L.H.; software and statistical analysis, M.P.L. and S.P.; data analysis, M.P.L., D.T.R. and S.P.; data curation and validation, M.P.L. and C.M.I.; writing—original draft preparation, M.P.L.; writing—review and editing, M.P.L., D.T.R., S.P., C.M.I., S.D.W. and R.L.H.; supervision, R.L.H., S.D.W. and C.M.I.; project administration, R.L.H. and S.D.W.; funding acquisition, M.P.L., R.L.H. and S.D.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science Foundation (NSF) Graduate Research Fellowship Program (GRFP) Grant No. DGE−1452154 and the Next-Generation Ecosystem Experiments (NGEE_Arctic) project at Oak Ridge National Laboratory (ORNL). ORNL is managed by the UT-Battelle, LLC for the U.S. Department of Energy (DOE) and NGEE_Arctic is supported by the Office of Biological and Environmental Research in the U.S. DOE’s Office of Science.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The high-resolution LC/MS data generated and analyzed in the current study can be found through the NGEE-Arctic data portal at http://dx.doi.org/10.5440/1464956 [68].

Acknowledgments

The authors would like to thank Sarvesh Iyer, Ivan Villalobos-Solis, and Tyler King for their assistance with processing soil samples, and to the reviewers and editors for their valuable time and feedback. The authors would also like to thank Paul Abraham and Rich Giannone for technical assistance regarding instrumentation setup and operation.

Conflicts of Interest

The authors declare no conflict of interest.

Disclaimer

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures and Tables

View Image - Figure 1. Cross-sectional illustrations of a high- (left) and low-centered (right) ice wedge polygon, showing their different microtopographies (e.g., trough, ridge, center) and representations of possible variations in hydrology, vegetation, and thaw depth.

Figure 1. Cross-sectional illustrations of a high- (left) and low-centered (right) ice wedge polygon, showing their different microtopographies (e.g., trough, ridge, center) and representations of possible variations in hydrology, vegetation, and thaw depth.

View Image - Figure 2. Correlation plots for each LC/MS condition of the log2 peak areas (abundance) for the 36 samples analyzed after normalization procedures. C = Carex aquatilis, E = Eriophorum angustfolium, A = Site A or LCP, and B = Site B or HCP.

Figure 2. Correlation plots for each LC/MS condition of the log2 peak areas (abundance) for the 36 samples analyzed after normalization procedures. C = Carex aquatilis, E = Eriophorum angustfolium, A = Site A or LCP, and B = Site B or HCP.

View Image - Figure 3. Volcano plots showing the differentially abundant LWM DOM features due to (a) polygon type or (b) vegetation, highlighting the features that had a fold change greater than 4 and passed the paired t-test p-value threshold of 0.001 (indicated by the dotted lines).

Figure 3. Volcano plots showing the differentially abundant LWM DOM features due to (a) polygon type or (b) vegetation, highlighting the features that had a fold change greater than 4 and passed the paired t-test p-value threshold of 0.001 (indicated by the dotted lines).

Figure 4. van Krevelen plot for molecular formulas assigned to differentially abundant HQFs due to polygon type or vegetation.

Figure 5. van Krevelen plot using N/C ratio instead of O/C ratio to explore nitrogen dynamics in Arctic LMW DOM pool.

View Image - Figure 6. Heat map of normalized log2 abundances for differentially abundant LMW DOM features with 4 clusters called out for further analysis.

Figure 6. Heat map of normalized log2 abundances for differentially abundant LMW DOM features with 4 clusters called out for further analysis.

Table 1

LMW DOM coverage by HILIC and RP in positive- and negative-ion mode at each level of data filtering, expressed as the number of features detected across all 36 soil water extracts from 4 cores obtained from 2 polygon types and 2 species of vegetation.

	HILIC (+)	HILIC (−)	RP (+)	RP (−)
Aligned peaks ^a	4686	2853	4213	1921
Features ^b	4352	2249	3655	1762
High-Quality Features ^c	3929	2170	3618	1541
Unique HQFs ^d	3414	1942	3494	1287
Abundant HQFs ^e	1966	776	1259	99
Differentially abundant ^f	322	76	122	1
Annotated ^g	283	74	117	1

^a Aligned peaks with same RT, MS¹, and MS² data from MZmine, ^b After zeros and artifacts (observed in blank or control) were removed, ^c Single RT, MS¹ (duplicates removed) and corresponding MS² spectrum, ^d Number of features remaining after overlap analysis where isomers and isobars were removed, ^e Observed in at least three samples across each core, ^f Passed paired t-test p-value of <0.001 and FC > 4, ^g Database match within 5 ppm, MS/MS confirmation, and biologically relevant compound or elemental formula assignment using high-mass accuracy MS¹ data and element heuristics.

Table 2

Average molecular properties for HQFs that were in higher relative abundance due to polygon type or vegetation.

	HCP	LCP	Due to Vegetation	All DifferentiallyAbundant Features
Number of features	92	95	30	217
Average formula	C₁₇H_21.5O_4.9N_1.2S_2.0P_0.2	C_11.9H_7.8O_6.5N_1.1S_2.3P_0.5	C_20.3H_20.6O₆N_0.9S_0.5P_0.2	C_15.2H_15.4O_5.7N_1.1S_1.9P_0.3
m/z	393.0249	361.4926	398.1955	379.9353
DBE	7.88	9.51	11.5	9.09
AI	0.37	0.75	0.42	0.54
H/C	1.12	0.75	1.12	0.81
O/C	0.45	3.40	0.38	0.62
N/C	0.15	0.17	0.08	0.20
O/S	2.6	3.4	5.13	3.26
DBE/C	0.59	0.79	0.56	0.67
DBE/H	1.15	2.10	0.76	1.51
DBE/O	3.2	2.35	3.62	2.89
C:N	14.4	10.9	23.4	13.9
% lipid	16.3	2.11	6.67	8.75
% protein	7.61	6.32	6.67	6.91
% lignin	13.0	4.21	30.0	11.5
% carbohydrate	5.43	6.32	6.67	5.99
% unsaturated	12.0	4.21	10.0	8.29
% aromatic	30.4	51.6	33.3	40.1
% tannin	15.2	25.3	6.67	18.4

Table 3

Selection of LMW DOM features detected in higher relative abundance at each of the sites.

Site with Higher Relative Abundance	LC/MS Condition	m/z	Predicted Formula	ppm Error	Class	Fold Change (Erioph Cores)	Fold Change (Carex Cores)
LCP	HILIC (−)	347.8884	C₉H₂O₉P₁S₂	−4.7	Tannin	−11.63	−7.645
LCP	HILIC (−)	400.8711	C₇H₂N₂O₁₂S₃	3.5	Tannin	−11.01	−6.884
LCP	HILIC (−)	367.8842	C₈H₃NO₁₀S₃	−1.1	Tannin	−10.61	−8.179
LCP	HILIC (−)	400.8702	C₇H₂N₂O₁₂S₃	1.3	Tannin	−10.16	−6.893
LCP	HILIC (−)	192.0527	C₇H₇N₅O₂	0.1	Lignin	-	−5.111
HCP	HILIC (+)	702.5350	C₅₀H₆₈O₂	0.2	Unsaturated Hydrocarbon	9.189	8.014
HCP	HILIC (+)	506.8323	C₁₃H₄N₂O₅S₇	−1.9	Condensed Hydrocarbon	7.311	8.501
HCP	RP (+)	273.2535	C₁₅H₃₂N₂O₂	−0.7	Lipid	5.685	7.433
HCP	RP (+)	453.3682	C₂₆H₄₈N₂O₄	−1.1	Lipid	6.616	9.623
HCP	RP (+)	104.0705	C₄H₉NO₂	−1.2	Protein	8.020	8.594
Site with Higher Relative Abundance	LC/MS Condition	*m/z*	Predicted Formula	ppm Error	Class	Fold Change (HCP Cores)	Fold Change (LCP Cores)
Erioph	HILIC (+)	286.1138	C₉H₁₉NO₉	1.8	Carbohydrate	−5.562	−7.98873
Erioph	HILIC (+)	363.0902	C₁₀H₁₄N₆O₉	1.9	Tannin	-	−7.38981
Erioph	HILIC (+)	251.0764	C₉H₁₄O₈	0.9	Carbohydrate	−6.575	−6.22072
Erioph	RP (+)	459.1955	C₃₂H₂₆O₃	0	Unsaturated Hydrocarbon	−7.075	-
Erioph	RP (+)	548.2498	C₁₁H₂₁N₂₇O	0.5	Lipid	−7.523	-
Erioph	RP (+)	550.2340	C₁₈H₂₇N₁₅O₆	−0.3	Lignin	−8.613	-
Erioph	RP (+)	226.1285	C₈H₁₉NO₆	−0.2	Carbohydrate	−7.708	-
Carex	HILIC (+)	191.0233	C₁₂H₂N₂O	−3.8	Aromatic	7.139	-
Carex	HILIC (−)	236.8647	no hit	-	-	4.198	-
Carex	HILIC (−)	414.7706	no hit	-	-	4.127	-

Word count: 7340

Show less

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Rising temperatures in the Arctic have led to the thawing of tundra soils, which is rapidly changing terrain, hydrology, and plant and microbial communities, causing hotspots of biogeochemical activity across the landscape. Despite this, little is known about how nutrient-rich low molecular weight dissolved organic matter (LMW DOM) varies within and across tundra ecosystems. Using a high-resolution nano-liquid chromatography-mass spectrometry (LC/MS) approach, we characterized the composition and availability of LMW DOM from high-centered polygons (HCP) and low-centered polygons (LCP) with Eriophorum angustifolium or Carex aquatilis as the dominant vegetation. Over 3000 unique features (i.e., discrete mass/charge ions) were detected; 521 were identified as differentially abundant between polygonal types and 217 were putatively annotated using high mass accuracy MS data. While polygon type was a strong predictor of LMW DOM composition and availability, vegetation and soil depth were also important drivers. Extensive evidence was found for enhanced microbial processing at the LCP sites, which were dominated by Carex plant species. We detected significant differences between polygon types with varying aboveground landscape features or properties, and hotspots of biogeochemical activity, indicating LMW DOM, as quantified by untargeted exometabolomics, provides a window into the dynamic complex interactions between landscape topography, vegetation, and organic matter cycling in Arctic polygonal tundra soils.

Details

Title

Untargeted Exometabolomics Provides a Powerful Approach to Investigate Biogeochemical Hotspots with Vegetation and Polygon Type in Arctic Tundra Soils

Author

Ladd, Mallory P¹; Reeves, David T¹; Poudel, Suresh²; Iversen, Colleen M³; Wullschleger, Stan D³; Hettich, Robert L⁴

¹ Bredesen Center for Interdisciplinary Research & Graduate Education, University of Tennessee, Knoxville, TN 37996, USA; [email protected] (M.P.L.); [email protected] (D.T.R.); Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN 37830, USA; [email protected]
² Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN 37830, USA; [email protected]; Genome Science & Technology, University of Tennessee, Knoxville, TN 37996, USA
³ Oak Ridge National Laboratory, Environmental Sciences Division, Climate Change Science Institute, Oak Ridge, TN 37830, USA; [email protected] (C.M.I.); [email protected] (S.D.W.)
⁴ Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN 37830, USA; [email protected]

First page

Publication year

2021

Publication date

2021

Publisher

MDPI AG

e-ISSN

25718789

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/soilsystems5010010

ProQuest document ID

2521488741

Untargeted Exometabolomics Provides a Powerful Approach to Investigate Biogeochemical Hotspots with Vegetation and Polygon Type in Arctic Tundra Soils

Jump to:

Full text

Abstract

Details

Suggested sources