Content area
Full Text
High-throughput sequencing projects generate genome-scale sequence data for species-level phylogenies1-3. However, state-of-the-art Bayesian methods for inferring timetrees are computationally limited to small datasets and cannot exploit the growing number of available genomes4. In the case of mammals, molecular-clock analyses of limited datasets have produced conflicting estimates of clade ages with large uncertainties5,6, and thus the timescale of placental mammal evolution remains contentious7-10. Here we develop a Bayesian molecular-clock dating approach to estimate a timetree of4,705 mammal species integrating information from 72 mammal genomes. We show that increasingly larger phylogenomic datasets produce diversification time estimates with progressively smaller uncertainties, facilitating precise tests of macroevolutionary hypotheses. For example, we confidently reject an explosive model of placental mammal origination in the Palaeogene8 and show that crown Placentalia originated in the Late Cretaceous with unambiguous ordinal diversification in the Palaeocene/Eocene. Our Bayesian methodology facilitates analysis of complete genomes and thousands of species within an integrated framework, making it possible to address hitherto intractable research questions on species diversifications. This approach can be used to address other contentious cases of animal and plant diversifications that require analysis of species-level phylogenomic datasets.
(ProQuest: ... denotes formula omitted.)
High-throughput sequencing projects are generating hundreds1 to thousands2 of genome sequences, with imminent plans to sequence more than a million species11. However, the accumulation of sequenced genomes is now outpacing the analytical capacity of computer software and many of the tools required to extract information from these vast datasets are lacking12. This is particularly the case for Bayesian Markov chain Monte Carlo (MCMC) molecular-clock methods that are used routinely to infer evolutionary timescales4, for groups including pathogens13, plants14 and animals15, but which are computationally expensive. Consequently, these methods have been limited in their application to datasets comprising dozens of genes for many species5,16 or many genes for dozens of species717, constraining the scope of evolutionary questions that can be addressed.
Although fast non-Bayesian clock-dating methods have been developed18, these typically do not incorporate uncertainties on evolutionary branch lengths19 or arbitrary fossil calibration densities20,21. However, the Bayesian approach-despite its computational expense-is appealing because it facilitates explicit integration of these uncertainties4. Furthermore, large genomic datasets enable inference of precise timelines that can be used to obtain correlations between diversification events and the geological and climatic...