Machine learning classifiers predict key genomic

Abstract

In this study, we investigate how an organism’s codon usage bias can serve as a predictor and classifier of various genomic and evolutionary traits across the domains of life. We perform secondary analysis of existing genetic datasets to build several AI/machine learning models. When trained on codon usage patterns of nearly 13,000 organisms, our models accurately predict the organelle of origin and taxonomic identity of nucleotide samples. We extend our analysis to identify the most influential codons for phylogenetic prediction with a custom feature ranking ensemble. Our results suggest that the genetic code can be utilized to train accurate classifiers of taxonomic and phylogenetic features. We then apply this classification framework to open reading frame (ORF) detection. Our statistical model assesses all possible ORFs in a nucleotide sample and rejects or deems them plausible based on the codon usage distribution. Our dataset and analyses are made publicly available on GitHub and the UCI ML Repository to facilitate open-source reproducibility and community engagement.

Details

Title

Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life

Author

Hallee, Logan¹; Khomtchouk, Bohdan B.²

¹ University of Delaware, Center for Bioinformatics and Computational Biology, Newark, USA (GRID:grid.33489.35) (ISNI:0000 0001 0454 4791)
² Indiana University, Department of BioHealth Informatics, Center for Computational Biology and Bioinformatics, Indianapolis, USA (GRID:grid.257413.6) (ISNI:0000 0001 2287 3919)

Pages

2088

Publication year

2023

Publication date

2023

Publisher

Nature Publishing Group

e-ISSN

20452322

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1038/s41598-023-28965-7

ProQuest document ID

2773494302

© The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life

Jump to:

Abstract

Details

Suggested sources