Abstract

Genome sequences are computationally assembled from millions of much shorter sequencing reads. Although this process can be impressively accurate with long reads, it is still subject to a variety of types of errors, including large structural misassembly errors in addition to localised base pair substitutions. Recent advances in long single molecule sequencing in combination with other long-range technologies such as synthetic long read clouds and Hi-C have dramatically increased the contiguity of assembly. This makes it all the more important to be able to validate the structural integrity of the chromosomal scale assemblies now being generated. Here we describe a novel assembly evaluation tool, Asset, which evaluates the consistency of a proposed genome assembly with multiple primary long-range data sets, identifying both supported regions and putative structural misassemblies. We present tests on three de novo assemblies from a human, a goat and a fish species, demonstrating that Asset can identify structural misassemblies accurately by combining regionally supported evidence from long read and other raw sequencing data. Not only can Asset be used to assess overall assembly confidence, and discover specific problematic regions for downstream genome curation, a process that leads to improvement in genome quality, but it can also provide feedback to automated assembly pipelines.

Competing Interest Statement

R.D. is a consultant for Dovetail Inc.

Footnotes

* fixed several typos and grammar errors.

Details

Title
Genome sequence assembly evaluation using long-range sequencing data
Author
Guan, Dengfeng; Mccarthy, Shane; Wood, Jonathan; Ning, Zemin; Sims, Ying; Chow, William; Howe, Kerstin; Wang, Guohua; Wang, Yadong; Durbin, Richard
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2022
Publication date
Jun 30, 2022
Publisher
Cold Spring Harbor Laboratory Press
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
ProQuest document ID
2661731837
Copyright
© 2022. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.