Content area

Abstract

Recently, a number of studies have looked at the problem of privacy and data-sharing restrictions in the context of missing genotype imputation servers. This relates to the most typical imputation pipelines which involve a whole-genome sequenced haplotype reference panel being compared to genotyped study individuals (who have missing data to be imputed). Hence, involving two datasets from separate sources coming together in one informatic environment, where relatively complicated statistical models are applied; specifically, hidden Markov modelling. We give a short review of the current literature in this domain, observing three prevalent strategies: complicated data encryption, technical solutions to secure computation environments, and rearrangements of haplotype data in an effort to provide anonymisation. We embarked on a thought experiment to provide a potential fourth type of solution involving federating the different internal tasks within the statistical methods used for imputation. This idea is relevant considering there is currently motivation for federated analyses platforms in Europe for making combined inference across multiple genomic data resources. This allows for very simple manipulations to protect sensitive individual level data, which enable imputation algorithms to complete on simple plain-text files. We provide here an illustration of how such a federated imputation server could be put in place, along with associated code, including a simple implementation of the Li-Stephens haplotype mosaic model to achieve the imputation of missing genotypes. We name our general framework ANONYMP for anonymised imputation. A demonstration of the concept is given involving simulated data generated with msprime. We show that dividing different parts of the required calculations for statistical imputation between several sites is a valuable new avenue in the field of privacy-preserving imputation server development.

Competing Interest Statement

The authors have declared no competing interest.

Details

1009240
Title
A simple demonstration of a privacy-preserving de-centralised genotype imputation workflow
Publication title
bioRxiv; Cold Spring Harbor
Publication year
2025
Publication date
Jan 15, 2025
Section
New Results
Publisher
Cold Spring Harbor Laboratory Press
Source
BioRxiv
Place of publication
Cold Spring Harbor
Country of publication
United States
University/institution
Cold Spring Harbor Laboratory Press
Publication subject
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
Document type
Working Paper
ProQuest document ID
3155862050
Document URL
https://www.proquest.com/working-papers/simple-demonstration-privacy-preserving-de/docview/3155862050/se-2?accountid=208611
Copyright
© 2025. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-01-16
Database
ProQuest One Academic