Content area

Abstract

Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual′s genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large-scale structural variation such as inversions and duplications1. Previous graph genome software implementations2,3,4 have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and using these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalized compressed suffix arrays5, with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at a gigabase scale, or at the topological complexity of de novo assemblies.

Details

Title
Variation graph toolkit improves read mapping by representing genetic variation in the reference
Author
Garrison, Erik 1 ; Sirén, Jouni 1 ; Novak, Adam M 2   VIAFID ORCID Logo  ; Hickey, Glenn 2 ; Eizenga, Jordan M 2 ; Dawson, Eric T 3 ; Jones, William 1 ; Garg, Shilpa 4 ; Markello, Charles 2 ; Lin, Michael F 5 ; Paten, Benedict 2 ; Durbin, Richard 6   VIAFID ORCID Logo 

 Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK 
 UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California, USA 
 Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK; National Cancer Institute, Rockville, Maryland, USA; Department of Genetics, University of Cambridge, Cambridge, UK 
 Max-Planck-Institut für Informatik, Saarbrücken, Germany 
 DNAnexus, Mountain View, California, USA 
 Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK; Department of Genetics, University of Cambridge, Cambridge, UK 
Pages
875-879
Publication year
2018
Publication date
Oct 2018
Publisher
Nature Publishing Group
ISSN
10870156
e-ISSN
15461696
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2319190927
Copyright
Copyright Nature Publishing Group Oct 2018