Abstract

Background: Repetitive genome regions, such as variable number of tandem repeats (VNTR) or short tandem repeats (STR), are major constituents of the uncharted dark genome and evade conventional sequencing approaches. The protein-coding LPA kringle IV type 2 (KIV 2) VNTR (5.6 kb per unit, 1-40 units per allele) is a medically highly relevant example with a particularly intricate structure, multiple haplotypes, intragenic homologies and an intra-VNTR STR. It is the primary regulator of plasma lipoprotein(a) [Lp(a)] concentrations, an important cardiovascular risk factor. However, despite Lp(a) variance is mostly genetically determined, Lp(a) concentrations vary widely between individuals and ancestries. This VNTR region hides multiple causal variants and functional haplotypes. Methods: We evaluated the performance of amplicon-based nanopore sequencing with unique molecular identifiers (UMI-ONT-Seq) for SNP detection, haplotype mapping, VNTR unit consensus sequence generation and copy number estimation via coverage-corrected haplotypes quantification in the KIV-2 VNTR. We used 15 human samples and low-level mixtures (0.5% to 5%) of KIV-2 plasmids as a validation set. We then applied UMI-ONT-Seq to extract KIV 2 VNTR haplotypes in 48 multi-ancestry 1000 Genome samples and analyzed at scale a poorly characterized STR within the KIV-2 VNTR. Results: UMI-ONT-Seq detected KIV-2 SNPs down to 1% variant level with high sensitivity, specificity and precision (0.977 ± 0.018; 1.000 ± 0.0005; 0.993 ± 0.02) and accurately retrieved the full-length haplotype of each VNTR unit. Human variant levels were highly correlated with next-generation sequencing (R2=0.983) without bias across the whole variant level range. Six reads per UMI produced sequences of each KIV-2 unit with Q40-quality. The KIV-2 repeat number determined by coverage-corrected unique haplotype counting was in close agreement with droplet digital PCR (ddPCR), with 70% of the samples falling even within the narrow confidence interval of ddPCR. We then analyzed 62,679 intra-KIV 2 STR sequences and identified ancestry-specific STR patterns. Finally, we characterized the KIV-2 haplotype patterns across multiple ancestries. Conclusions: UMI-ONT-Seq accurately retrieves the SNP haplotype and precisely quantifies the VNTR copy number of each repeat unit of the complex KIV-2 VNTR region across multiple ancestries. This study utilizes the KIV-2 VNTR, presenting a novel and potent tool for comprehensive characterization of medically relevant complex genome regions at scale.

Competing Interest Statement

Stefan Coassin has received honoraria from Novartis AG (Basel, CH) and Silence Therapeutics PLC (London, UK) for consultancy on LPA genetics. Florian Kronenberg has received honoraria from Novartis AG, CRISPR Therapeutics, Silence Therapeutics, Roche and Amgen for consultancy on lipoprotein(a), as well as lecture fees. Lukas Forer has received honoraria from Novartis AG (Basel, CH) for consultancy related to lipoprotein(a).

Details

Title
Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex Lipoprotein(a) KIV-2 VNTR
Author
Amstler, Stephan; Streiter, Gertraud; Pfurtscheller, Cathrin; Forer, Lukas; Silvia Di Maio; Weissensteiner, Hansi; Paulweber, Bernhard; Schoenherr, Sebastian; Kronenberg, Florian; Coassin, Stefan
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2024
Publication date
Mar 5, 2024
Publisher
Cold Spring Harbor Laboratory Press
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
ProQuest document ID
2937452625
Copyright
© 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.