ALIGNMENT-FREE PHYLOGENETIC OUTLINE OF A

Full text

Headnote

Abstract - To asses the degree of randomness and complexity of randomly generated sequences, in an in vitro selection experiment by Keefe and Szostack [1], we calculated the Kolmogorov complexity, the algorithmic redundancy, and the Shannon entropy of the sequences. We built an alignment-free phylogenetic tree, employing the algorithmic information distance between each pair of sequences to construct the distance-matrix. The tree represents the history of the set of molecular sequences, and allows us to follow in more detail how chemical function improves with respect to the original sequence. We remark the fact that in directed evolution, the highly predominant changes are between neighboring codons. Thus, the amino acid changes in the protein are not arbitrary, but dictated by the amino acid assignments in the code.

Keywords: Kolmogorov complexity, Shannon entropy of the sequences, algorithmic redundancy, phylogenetic tree, non-biological proteins.

(ProQuest: ... denotes formula omitted.)

"It seems as though biologists are extraordinarily fond of randomness. A population is defined as one, randomly mating, interbreeding unit, although truly random mating would hardly be practicable in a reasonably large population. Similarly, spontaneous mutations are viewed as randomly sustained base substitutions, in spite of our knowledge of mutational hot spots. I suspect that this extraordinarily strong belief in randomness stems from our too strong faith in the power of natural selection."

S. Ohno, [24]

1. Introduction

The frequency of occurrence of functional proteins in collections of randomly generated sequences is an important constraint on models of the evolution of biological proteins. Therefore, the experimental determination of this frequency, by isolating proteins with a specific function from a large random-sequence library of known size, is a relevant endeavor in this field. In an effort to substantiate the hypothesis that primordial functional proteins originated from random sequences, Keefe and Szostak [1] used in vitro selection of messenger RNA displayed proteins to sample a large population of distinct randomly generated sequences.

Starting from a library of 6 x 1012 polypeptides, each containing 81 contiguous randomly chosen amino acids, they selected functional proteins by enriching for those that bind to ATP. As a result, following eight rounds of selection, they obtained four new ATPbinding protein families, designated A, B, C, D (Fig. 3a of their paper), that appear to be...

Show less

ALIGNMENT-FREE PHYLOGENETIC OUTLINE OF A RANDOM-SEQUENCE LIBRARY OF NON-BIOLOGICAL PROTEINS

Full text

Suggested sources

ALIGNMENT-FREE PHYLOGENETIC OUTLINE OF A RANDOM-SEQUENCE LIBRARY OF NON-BIOLOGICAL PROTEINS

Content area

Full text

Suggested sources