Content area
Full Text
ABSTRACT Comprehensive whole-genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short-read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing. While whole-genome haplotyping is not achieved, local phasing (across .150-kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variations that are .5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation.
KEYWORDS biotechnology; genome mapping; structural variation detection
WHOLE-GENOME short-read sequencing is now routine and affordable. However, three challenges remain in genome analysis: genome sequence assembly, structural variation detection, and separation of the two parental genomes. In addition to the fact that humans are diploid, with cells harboring two genomes from the parents, the presence of numerous repetitive elements that are longer than the usual sequencing library insert sizemakes it close toimpossibletoassemblegenome sequences with short-read sequencing alone (El-Metwally et al. 2013). Consequently, almost all whole-genome sequencing projects map the sequencing reads onto the human reference genome sequence without performing wholegenome assemblies (Ley et al. 2008). When whole-genome assembly is attempted, it is done by the laborious and expensive approach of generating paired-end sequencing of cloned genomic DNA fragments to provide scaffolds for sequence assembly (Siegel et al. 2000). Alignment of short sequencing reads to the human reference genome sequence reveals single- nucleotide variation and small indels in the individuals sequenced, but larger structural variants and repetitive regions inthegenomearemoredifficult to detect. As structural variation can disrupt genes or regulatory elements, whole-genome sequencing without assembly and detection of structural variation produces an incomplete picture of the genome. Recently, clonefree approaches (e.g., Hi-C scaffolding) have been used to generate sequence motif maps or long sequences to serve as scaffolds for the assembly of highly accurate short-read sequences (Burton et al. 2013; Kaplan and...