Abstract

Genomics analyses use large reference sequence collections, like pangenomes or taxonomic databases. SPUMONI 2 is an efficient tool for sequence classification of both short and long reads. It performs multi-class classification using a novel sampled document array. By incorporating minimizers, SPUMONI 2’s index is 65 times smaller than minimap2’s for a mock community pangenome. SPUMONI 2 achieves a speed improvement of 3-fold compared to SPUMONI and 15-fold compared to minimap2. We show SPUMONI 2 achieves an advantageous mix of accuracy and efficiency in practical scenarios such as adaptive sampling, contamination detection and multi-class metagenomics classification.

Details

Title
SPUMONI 2: improved classification using a pangenome index of minimizer digests
Author
Ahmed, Omar Y; Rossi, Massimiliano; Gagie, Travis; Boucher, Christina; Langmead, Ben
Pages
1-15
Section
Software
Publication year
2023
Publication date
2023
Publisher
BioMed Central
ISSN
14747596
e-ISSN
1474760X
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2815638094
Copyright
© 2023. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.