Content area
Full text
Abstract
We can now assign about two thirds of the sequences from completed genomes to as few as 1400 domain families for which structures are known and thus more ancient evolutionary relationships established. About 200 of these domain families are common to all kingdoms of life and account for nearly 50% of domain structure annotations in the genomes. Some of these domain families have been very extensively duplicated within a genome and combined with different domain partners giving rise to different multidomain proteins. The ways in which these domain combinations evolve tend to be specific to the organism so that less than 15% of the protein families found within a genome appear to be common to all kingdoms of life. Recent analyses of completed genomes, exploiting the structural data, have revealed the extent to which duplication of these domains and modifications of their functions can expand the functional repertoire of the organism, contributing to increasing complexity.
Key Words protein classifications, comparative genomics, bioinformatics
INTRODUCTION
The unraveling of the genetic code by Watson and Crick, over 50 years ago, started a new era in evolutionary biology. Building on these insights came the revolutionary technologies for sequencing proteins, developed by Sanger in the early 1950s. These were quantum leaps in biology, and the resulting expansions in the datasets of known protein sequences by the international genome projects, together with significant advances in the computational methods for detecting similarities between evolutionarily related genes, are now promising to yield profound insights into the evolution of proteins, their functions, and the biological processes in which they participate.
The mechanisms by which genomic DNA can change during evolution are now being elucidated, thanks to the explosion of data from these sequencing projects and the growing diversity of genomes from all kingdoms of life. Many proteins in these organisms comprise more than one domain (for example, see Figure 10, below). Although the importance of domain duplication in evolution has long been recognized, analyses of completed genomes have confirmed the extent to which this duplication is clearly occurring (1). In prokaryotes, at least 70% of the domains have been duplicated, whereas in eukaryotes this figure appears to be as high as 90% (2).
Computational analyses of data from both prokaryotic and eukaryotic...





