Abstract

The reliable detection of novel bacterial pathogens from next generation sequencing data is a key challenge for microbial diagnostics. Current computational tools usually rely on sequence similarity and often fail to detect novel species when closely related genomes are unavailable or missing from reference database used. Here, we present the random forest based approach PaPrBaG (Pathogenicity Prediction for Bacterial Genomes). PaPrBaG overcomes genetic divergence by training on a wide set of species with known pathogenicity phenotype. To that end we generated a novel label source of pathogenic and non-pathogenic bacterial strains, using a rule-based protocol to annotate pathogenicity based on genome metadata. A detailed comparative study reveals that PaPrBaG has several advantages over sequence similarity approaches. Most importantly, it always provides a prediction whereas other approaches discard a large number of sequencing reads that are far away from currently known reference genomes. Furthermore, PaPrBaG remains reliable even at very low genomic coverages. Combining PaPrBaG with existing approaches further improves prediction results.

Details

Title
PaPrBaG: A random forest approach for the detection of novel pathogens from NGS data
Author
Deneke, Carlus; Rentzsch, Robert; Renard, Bernhard Y
Publication year
2016
Publication date
Aug 19, 2016
Publisher
PeerJ, Inc.
e-ISSN
21679843
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
1953809699
Copyright
© 2016 Deneke et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.