Predicting Antimicrobial Resistance Using

Abstract

A growing number of studies have shown that machine learning algorithms can be used to accurately predict antimicrobial resistance (AMR) phenotypes from bacterial sequence data. In these studies, models are typically trained using input features derived from comprehensive sets of known AMR genes or whole genome sequences. However, it can be difficult to determine whether genomes and their corresponding sets of AMR genes are complete when sequencing contaminated or metagenomic samples. In this study, we explore the possibility of using incomplete genome sequence data to predict AMR phenotypes. Machine learning models were built from randomly-selected sets of core genes that are held in common among the members of a species, and the AMR-conferring genes were removed based on their protein annotations. For Klebsiella pneumoniae, Mycobacterium tuberculosis, Salmonella enterica, and Staphylococcus aureus, we report that it is possible to classify susceptible and resistant phenotypes with average F1 scores ranging from 0.80-0.89 with as few as 100 conserved non-AMR genes, with very major error rates ranging from 0.11-0.23 and major error rates ranging from 0.10-0.20. Models built from core genes have predictive power in the cases where the primary AMR mechanism results from SNPs or horizontal gene transfer. By randomly sampling non-overlapping sets of core genes for use in these models, we show that F1 scores and error rates are stable and have little variance between replicates. Potential biases from strain-specific SNPs, phylogenetic sampling, and imbalances in the phylogenetic distribution of susceptible and resistant strains do not appear to have an impact on this result. Although these small core gene models have lower accuracies and higher error rates than models built from the corresponding assembled genomes, the results suggest that sufficient variation exists in the core non-AMR genes of a species for predicting AMR phenotypes. Overall this study suggests that building models from conserved genes may be a potentially useful strategy for predicting AMR phenotypes when genomes are incomplete.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

* https://github.com/jimdavis1/Core-Gene-AMR-Models

Details

Title

Predicting Antimicrobial Resistance Using Conserved Genes

Author

Nguyen, Marcus; Olson, Robert; Shukla, Maulik; Vanoeffelen, Margo; Davis, James J

University/institution

Cold Spring Harbor Laboratory Press

Section

New Results

Publication year

2020

Publication date

Apr 29, 2020

Publisher

Cold Spring Harbor Laboratory Press

ISSN

2692-8205

Source type

Working Paper

Language of publication

English

DOI

https://doi.org/10.1101/2020.04.29.068254

ProQuest document ID

2396056983

© 2020. This article is published under https://creativecommons.org/publicdomain/zero/1.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Predicting Antimicrobial Resistance Using Conserved Genes

Jump to:

Abstract

Details

Suggested sources