Abstract

Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI’s Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1–10 min. Testing FCS-GX on artificially fragmented genomes demonstrates high sensitivity and specificity for diverse contaminant species. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of contamination, comprising 0.16% of total bases, with half from 161 assemblies. We updated assemblies in NCBI RefSeq to reduce detected contamination to 0.01% of bases. FCS-GX is available at https://github.com/ncbi/fcs/ or https://doi.org/10.5281/zenodo.10651084.

Details

Title
Rapid and sensitive detection of genome contamination at scale with FCS-GX
Author
Astashyn, Alexander; Tvedte, Eric S; Sweeney, Deacon; Sapojnikov, Victor; Bouk, Nathan; Joukov, Victor; Mozes, Eyal; Strope, Pooja K; Sylla, Pape M; Wagner, Lukas; Bidwell, Shelby L; Brown, Larissa C; Clark, Karen; Davis, Emily W; Smith-White, Brian; Hlavina, Wratko
Pages
1-25
Section
Method
Publication year
2024
Publication date
2024
Publisher
BioMed Central
ISSN
14747596
e-ISSN
1474760X
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2956876315
Copyright
© 2024. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.