Content area

Abstract

The genetic code, a unifying principle in biology, ensures that all organisms, stemming from a Last Universal Common Ancestor (LUCA), share fundamental rules for translating DNA into proteins. However, codon usage varies across the tree of life, influenced not only by GC-content and proteome composition but also by complex, often less understood rules dependent on each species’ evolutionary trajectory. To better understand these rules, we segregated codons into their functional parts and applied Shannon’s information-theoretic measures to 1,434 species from eight diverse taxonomic groups. We provide robust evidence that the first codon base plays a central role in amino acid determination, while the third base serves an accessory function. Using conditional entropy measures, we rigorously quantified this relationship, universally confirming the greater informational variability of the third base across all sampled species for the first time at this scale. Our analysis revealed significant heterogeneity in coding strategies across different taxonomic groups. Notably, the unique variability observed in Archaea, in contrast to the more constrained patterns in Eukaryotes and Bacteria, underscores the profound influence of evolutionary pressures and distinct life histories on genetic information processing. The identification of outlier species, exhibiting distinct informational profiles, highlights specific instances where unusual lifestyles or ecological niches may have driven unique adaptations in codon usage and underlying informational dependencies. These informational patterns offer a complementary perspective to traditional phylogenetic analyses, further revealing a hierarchical organization of informational dependencies among codon components that sheds light on the intricate grammar of genetic information. We also rigorously investigated the relationship between GC-content and our informational measures, concluding that these entropy measures provide valuable insights that cannot be obtained from GC-content alone. This work not only offers a novel framework for quantifying informational properties of codon usage but also reveals previously unappreciated aspects of how genetic information is encoded and processed across life’s domains.

Details

1009240
Title
Sampling informational properties of codon usage through the tree of life
Publication title
PLoS One; San Francisco
Volume
20
Issue
11
First page
e0335824
Number of pages
20
Publication year
2025
Publication date
Nov 2025
Section
Research Article
Publisher
Public Library of Science
Place of publication
San Francisco
Country of publication
United States
e-ISSN
19326203
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Milestone dates
2025-06-29 (Received); 2025-10-16 (Accepted); 2025-11-26 (Published)
ProQuest document ID
3276035162
Document URL
https://www.proquest.com/scholarly-journals/sampling-informational-properties-codon-usage/docview/3276035162/se-2?accountid=208611
Copyright
© 2025 Martínez et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-11-27
Database
ProQuest One Academic