Content area
Full Text
BRIEF COMMUNICATIONS
Fast gapped-read alignment with Bowtie 2
npg 2012 Nature America, Inc. All rights reserved.
Ben Langmead1,2 & Steven L Salzberg13
As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efcient, but the approach is ill-suited to nding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the exibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Aligning sequencing reads to a reference genome is the first step in many comparative genomics pipelines, including pipelines for variant calling1, isoform quantitation2 and differential gene expression3. In many cases, the alignment step is the slowest. This is because for each read the aligner must solve a difficult computational problem: determining the reads likely point of origin with respect to a reference genome4.
Many aligners use a genome index to rapidly narrow the list of candidate alignment locations. The full-text minute index5 is a fast and memory-efficient index that has been used in recent aligners610. Index-assisted aligners work by searching for all ways of mutating the read string into a string that occurs in the reference, subject to an alignment policy limiting the number of differences. Although this search space is large, many portions of it can be skipped (pruned) without loss of sensitivity. In practice, pruning strategies such as double indexing6 and bidirectional Burrows-Wheeler transform (BWT)7 facilitate very efficient ungapped alignment of short reads.
Index-aided alignment can be quite inefficient, however, when alignments are permitted to contain gaps. Alignment gaps can result either from sequencing errors or from true insertions and deletions. Ungapped aligners such as Bowtie will usually fail to align reads spanning gaps and will therefore miss evidence for these events. Gaps greatly increase the size of the search space and reduce the effectiveness of pruning, thereby substantially slowing aligners built solely on index-assisted alignment. Bowtie 2 extends the full-text minute indexbased approach of Bowtie to permit gapped alignment by dividing the algorithm broadly into two stages: an initial, ungapped seed-finding stage that benefits from the speed and memory efficiency of the full-text minute index and a gapped extension stage that uses dynamic...