Abstract

The Telomere-to-Telomere Consortium recently finished the first truly complete sequence of a human genome. To resolve the most complex repeats, this project relied on the semi-manual combination of long, accurate PacBio HiFi and ultra-long Oxford Nanopore sequencing reads. The Verkko assembler later automated this process, achieving complete assemblies for approximately half of the chromosomes in a diploid human genome. However, the first version of Verkko was computationally expensive and could not resolve all regions of a typical human genome. Here we present Verkko2, which implements a more efficient read correction algorithm, improves repeat resolution and gap closing, introduces proximity-ligation-based haplotype phasing and scaffolding, and adds support for multiple long-read data types. These enhancements allow Verkko to assemble all regions of a diploid human genome, including the short arms of the acrocentric chromosomes and both sex chromosomes. Together, these changes increase the number of telomere-to-telomere scaffolds by twofold, reduce runtime by fourfold, and improve assembly correctness. On a panel of 19 human genomes, Verkko2 assembles an average of 39 of 46 complete chromosomes as scaffolds, with 21 of these assembled as gapless contigs. Together, these improvements enable telomere-to-telomere comparative and pangenomics, at scale.

Competing Interest Statement

Sergey Koren has received travel funds to speak at events hosted by Oxford Nanopore Technologies. Sergey Nurk is an employee of Oxford Nanopore Technologies. The remaining authors declare no competing interests.

Footnotes

* Fixing missing supplemental files. No changes in main text.

Details

Title
Verkko2: Integrating proximity ligation data with long-read De Bruijn graphs for efficient telomere-to-telomere genome assembly, phasing, and scaffolding
Author
Phillippy, Adam M; Koren, Sergey; Antipov, Dmitry; Rautiainen, Mikko; Nurk, Sergey; Solar, Steven J; Walenz, Brian P
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2024
Publication date
Dec 26, 2024
Publisher
Cold Spring Harbor Laboratory Press
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
ProQuest document ID
3148057431
Copyright
© 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.