Full text

Turn on search term navigation

© 2021 Youngblut and Ley. This is an open access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Mapping metagenome reads to reference databases is the standard approach for assessing microbial taxonomic and functional diversity from metagenomic data. However, public reference databases often lack recently generated genomic data such as metagenome-assembled genomes (MAGs), which can limit the sensitivity of read-mapping approaches. We previously developed the Struo pipeline in order to provide a straight-forward method for constructing custom databases; however, the pipeline does not scale well enough to cope with the ever-increasing number of publicly available microbial genomes. Moreover, the pipeline does not allow for efficient database updating as new data are generated. To address these issues, we developed Struo2, which is >3.5 fold faster than Struo at database generation and can also efficiently update existing databases. We also provide custom Kraken2, Bracken, and HUMAnN3 databases that can be easily updated with new genomes and/or individual gene sequences. Efficient database updating, coupled with our pre-generated databases, enables “assembly-enhanced” profiling, which increases database comprehensiveness via inclusion of native genomic content. Inclusion of newly generated genomic content can greatly increase database comprehensiveness, especially for understudied biomes, which will enable more accurate assessments of microbiome diversity.

Details

Title
Struo2: efficient metagenome profiling database construction for ever-expanding microbial genome datasets
Author
Youngblut, Nicholas D; Ley, Ruth E
Publication year
2021
Publication date
Sep 16, 2021
Publisher
PeerJ, Inc.
e-ISSN
21678359
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2573113003
Copyright
© 2021 Youngblut and Ley. This is an open access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.