Prot-SpaM: Fast alignment-free phylogeny

Abstract

Word-based or `alignment-free' sequence comparison has become an active research area in bioinformatics. While previous word-frequency approaches calculated rough measures of sequence similarity or dissimilarity, some new alignment-free methods are able to accurately estimate phylogenetic distances between genomic sequences. One of these approaches is Filtered Spaced Word Matches. Herein, we extend this approach to estimate evolutionary distances between complete or incomplete proteomes; our implementation of this approach is called Prot-SpaM. We compare the performance of Prot-SpaM to other alignment-free methods on simulated sequences and on various groups of eukaryotic and prokaryotic taxa. Prot-SpaM can be used to calculate high-quality phylogenetic trees for dozens of whole-proteome sequences in a matter of seconds or minutes and often outperforms other alignment-free approaches. The source code of our software is available through Github: https://github.com/jschellh/ProtSpaM

Details

Title

Prot-SpaM: Fast alignment-free phylogeny reconstruction based on whole-proteome sequences

Author

Chris-Andre Leimeister; Schellhorn, Jendrik; Svenja Sch��bel; Gerth, Michael; Bleidorn, Christoph; Morgenstern, Burkhard

University/institution

Cold Spring Harbor Laboratory Press

Section

New Results

Publication year

2018

Publication date

Sep 7, 2018

Publisher

Cold Spring Harbor Laboratory Press

ISSN

2692-8205

Source type

Working Paper

Language of publication

English

DOI

https://doi.org/10.1101/306142

ProQuest document ID

2071208574

�� 2018. This article is published under http://creativecommons.org/licenses/by-nd/4.0/ (��the License��). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Prot-SpaM: Fast alignment-free phylogeny reconstruction based on whole-proteome sequences

Jump to:

Abstract

Details

Suggested sources