Abstract

Background

Approximating the recent phylogeny of N phased haplotypes at a set of variants along the genome is a core problem in modern population genomics and central to performing genome-wide screens for association, selection, introgression, and other signals. The Li & Stephens (LS) model provides a simple yet powerful hidden Markov model for inferring the recent ancestry at a given variant, represented as an \(N \times N\) distance matrix based on posterior decodings.

Results

We provide a high-performance engine to make these posterior decodings readily accessible with minimal pre-processing via an easy to use package kalis, in the statistical programming language R. kalis enables investigators to rapidly resolve the ancestry at loci of interest and developers to build a range of variant-specific ancestral inference pipelines on top. kalis exploits both multi-core parallelism and modern CPU vector instruction sets to enable scaling to hundreds of thousands of genomes.

Conclusions

The resulting distance matrices accessible via kalis enable local ancestry, selection, and association studies in modern large scale genomic datasets.

Details

Title
kalis: a modern implementation of the Li & Stephens model for local ancestry inference in R
Author
Aslett, Louis J M; Christ, Ryan R
Pages
1-18
Section
Software
Publication year
2024
Publication date
2024
Publisher
BioMed Central
e-ISSN
14712105
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2956835625
Copyright
© 2024. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.