Content area
Full Text
P R I M E R
How to apply de Bruijn graphs to genome assembly
2011 Nature America, Inc. All rights reserved.
Phillip E C Compeau, Pavel A Pevzner & Glenn Tesler
A mathematical concept known as a de Bruijn graph turns the formidable challenge of assembling a contiguous genome from billions of short sequencing reads into a tractable computational problem.
The development of algorithmic ideas for next-generation sequencing can be
traced back 300 years to the Prussian city of Knigsberg (present-day Kaliningrad, Russia), where seven bridges joined the four parts of the city located on opposing banks of the Pregel River and two river islands (Fig. 1a). At the time, Knigsbergs residents enjoyed strolling through their city, and they
wondered if every part of the city could be visited by
walking across each of the seven bridges exactly once and returning to ones starting location.The solution came in 1735, when the great mathematician Leonhard Euler1 made a conceptual breakthrough that would solve this Bridges of Knigsberg problem. Eulers first insight was to represent each landmass as a point (called a node) and each bridge as a line segment (called an edge) connecting the appropriate two points. This creates a grapha network of nodes connected by edges (Fig. 1b).By describing a procedure for determining whether an arbitrary graph contains a path that visits every edge exactly once and returns to where it started, Euler not only resolved the Bridges of Knigsberg problem but also effectively launched the entire branch of mathematics known today as graph theory2.
Since Eulers original description, the use of graph theory has turned out to have many
a b
Figure 1 Bridges of Knigsberg problem. (a) A map of old Knigsberg, in which each area of the city is labeled with a different color point. (b) The Knigsberg Bridge graph, formed by representing each of four land areas as a node and each of the citys seven bridges as an edge.
additional practical applications, most of which have greater scientific importance than the development of walking itineraries. Specifically, Eulers ideas were subsequently adapted by Dutch mathematician Nicolaas de Bruijn to find a cyclic sequence of letters taken from a given alphabet for which every possible word of a certain length (k)...