How sequence alignment scores correspond to

Abstract

Sequence alignment remains fundamental in bioinformatics. Pair-wise alignment is traditionally based on ad hoc scores for substitutions, insertions, and deletions, but can also be based on probability models (pair hidden Markov models: PHMMs). PHMMs enable us to: fit the parameters to each kind of data, calculate the reliability of alignment parts, and measure sequence similarity integrated over possible alignments. This study shows how multiple models correspond to one set of scores. Scores can be converted to probabilities by partition functions with a "temperature" parameter: for any temperature, this corresponds to some PHMM. There is a special class of models with balanced length probability, i.e. no bias towards either longer or shorter alignments. The best way to score alignments and assess their significance depends on the aim: judging whether whole sequences are related versus finding related parts. This clarifies the statistical basis of sequence alignment.

Details

Title

How sequence alignment scores correspond to probability models

Author

Frith, Martin

University/institution

Cold Spring Harbor Laboratory Press

Section

New Results

Publication year

2019

Publication date

Mar 18, 2019

Publisher

Cold Spring Harbor Laboratory Press

ISSN

2692-8205

Source type

Working Paper

Language of publication

English

DOI

https://doi.org/10.1101/580951

ProQuest document ID

2193409663

© 2019. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

How sequence alignment scores correspond to probability models

Jump to:

Abstract

Details

Suggested sources