Content area

Abstract

Motivation: Taxonomic analysis of environmental microbial communities is now routinely performed thanks to advances in DNA sequencing. Determining the role of these communities in global biogeochemical cycles requires the identification of their metabolic functions, such as hydrogen oxidation, sulfur reduction, and carbon fixation. These functions can be directly inferred from metagenomics data, but in many environmental applications metabarcoding is still the method of choice. The reconstruction of metabolic functions from metabarcoding data and their integration into coarse-grained representations of geobiochemical cycles remains a difficult bioinformatics problem today. Results: We developed a pipeline, called Tabigecy, which exploits taxonomic affiliations to predict metabolic functions constituting biogeochemical cycles. In a first step, Tabigecy uses the tool EsMeCaTa to predict consensus proteomes from input affiliations. To optimise this process, we generated a precomputed database containing information about 2,404 taxa from UniProt. The consensus proteomes are searched using bigecyhmm, a newly developed Python package relying on Hidden Markov Models to identify key enzymes involved in metabolic function of biogeochemical cycles. The metabolic functions are then projected on coarse-grained representation of the cycles. We applied Tabigecy to two salt cavern datasets and validated its predictions with microbial activity and hydrochemistry measurements performed on the samples. The results highlight the utility of the approach to investigate the impact of microbial communities on geobiochemical processes. Availability: The Tabigecy pipeline is available at https://github.com/ArnaudBelcour/tabigecy. The Python package bigecyhmm and the precomputed EsMeCaTa database are also separately available at https://github.com/ArnaudBelcour/bigecyhmm and https://doi.org/10.5281/zenodo.13354073, respectively.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

* https://github.com/ArnaudBelcour/tabigecy

* https://github.com/ArnaudBelcour/bigecyhmm

* https://doi.org/10.5281/zenodo.13354073

Details

1009240
Title
Predicting coarse-grained representations of biogeochemical cycles from metabarcoding data
Publication title
bioRxiv; Cold Spring Harbor
Publication year
2025
Publication date
Feb 1, 2025
Section
New Results
Publisher
Cold Spring Harbor Laboratory Press
Source
BioRxiv
Place of publication
Cold Spring Harbor
Country of publication
United States
University/institution
Cold Spring Harbor Laboratory Press
Publication subject
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
Document type
Working Paper
ProQuest document ID
3162417145
Document URL
https://www.proquest.com/working-papers/predicting-coarse-grained-representations/docview/3162417145/se-2?accountid=208611
Copyright
© 2025. This article is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-02-02
Database
ProQuest One Academic