Accurately assembling nanopore sequencing data of

Abstract

Background

Bacterial genome exploration and outbreak analysis rely heavily on robust whole-genome sequencing and bioinformatics analysis. Widely-used genomic methods, such as genotyping and detection of genetic markers demand high sequencing accuracy and precise genome assembly for reliable results.

Methods

To assess the utility of nanopore sequencing for genotyping highly pathogenic bacteria with low mutation rates, we sequenced six reference strains using Oxford Nanopore Technologies (ONT) R10.4.1 chemistry and Illumina and evaluated different assembly strategies. The publicly available RefSeq assemblies were chosen as the ground truth. Publicly available sequencing data from key foodborne and public-health-related bacterial pathogens were examined to provide a broader context for the analysis.

Results

While for Bacillus (Ba.) anthracis an almost perfect assembly was achieved, results varied for other species. For Brucella (Br.) spp., the final assemblies comprised five to 46 different nucleotides in comparison to Sanger-sequenced references. For some key foodborne and public-health-related bacterial pathogens (Klebsiella (K.) variicola, Listeria spp., Mycobacterium (M.) tuberculosis, Staphylococcus (Sta.) aureus, and Streptococcus (Str.) pyogenes) perfect genomes were obtained. Enhanced basecalling models have generally improved assembly accuracy, however, for certain species such as Br. abortus, older models have produced higher accuracy. While long-read polishing mainly improves assembly quality with only one round needed, our results indicate that this process may also degrade assembly quality. Overall, 81% of the observed errors in ONT assemblies were located within coding sequences (CDS). Furthermore, we found that methylation caused 6.5% of the errors, and the bacterial methylation-aware medaka polishing model reduced the number of errors linked to methylation. Core-genome Multilocus Sequence Typing (cgMLST) analysis revealed allele differences in Ba. anthracis, Br. abortus, and Francisella (F.) tularensis for some assemblers, although with fewer than five allele differences. In the case of Br. melitensis, some assemblies included five allele differences, whereas for Br. suis the correct cgMLST alleles were observed.

Conclusions

Assembling nanopore data from pathogenic bacteria vary in quality across different species and methods. However, errors persist in the final assemblies, including within cgMLST loci, influencing the reliability of outbreak predictions. Nevertheless, specific combinations of existing tools can generate perfect genome assemblies from bacterial ONT sequencing data for outbreak analysis without short-read polishing.

Details

Title

Accurately assembling nanopore sequencing data of highly pathogenic bacteria

Author

Thomas, Christine; Brangsch, Hanka; Galeone, Valentina; Hölzer, Martin; Marz, Manja; Linde, Jörg

Pages

1-19

Section

Research

Publication year

2025

Publication date

2025

Publisher

Springer Nature B.V.

e-ISSN

14712164

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1186/s12864-025-11793-6

ProQuest document ID

3247110401

© 2025. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Accurately assembling nanopore sequencing data of highly pathogenic bacteria

Jump to:

Abstract

Details

Full text options

Suggested sources