Full text

Turn on search term navigation

© 2022, Vanni et al. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40–60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71 M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data.

Details

Title
Unifying the known and unknown microbial coding sequence space
Author
Vanni Chiara; Schechter, Matthew S; Acinas, Silvia G; Barberán Albert; Buttigieg, Pier Luigi; Casamayor, Emilio O; Delmont, Tom O; Duarte, Carlos M; Murat, Eren A; Finn, Robert D; Kottmann Renzo; Mitchell, Alex; Sánchez, Pablo; Siren Kimmo; Steinegger, Martin; Gloeckner, Frank Oliver; Fernàndez-Guerra, Antonio
University/institution
U.S. National Institutes of Health/National Library of Medicine
Publication year
2022
Publication date
2022
Publisher
eLife Sciences Publications Ltd.
e-ISSN
2050084X
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2671921386
Copyright
© 2022, Vanni et al. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.