Content area
Full Text
BRIEF COMMUNICATIONS
UPARSE: highly accurate OTU sequences from microbial amplicon reads
npg 2013 Nature America, Inc. All rights reserved.
Robert C Edgar
Amplied marker-gene sequences can be used to understand microbial community structure, but they suffer from a high level of sequencing and amplication artifacts. The UPARSE pipeline reports operational taxonomic unit (OTU) sequences with 1% incorrect bases in articial microbial community tests, compared with >3% incorrect bases commonly reported by other methods. The improved accuracy results in far fewer OTUs, consistently closer to the expected number of species in a community.
A number of recent large-scale studies have taken advantage of next-generation sequencing to characterize microbial community structure and function, including the Human Microbiome Project (HMP)1 and a survey of the Arabidopsis thaliana root microbiome2. Many of these projects assess community structure by sequencing amplified markers, such as the 16S ribosomal RNA gene, which are organized into OTUs: groups of sequences that are intended to correspond to taxonomic clades or monophyletic groups. Yet data analysis in this type of study is hampered by ubiquitous artifacts introduced by amplification and sequencing. Current techniques for reducing artifacts include quality filtering of reads3, denoising of flowgrams46, chimera filtering6,7 and clustering8, but many biases and spurious OTUs due to unfiltered artifacts often remain, confounding inferences of community structure and function9. A large fraction of OTU representative sequences produced by recommended procedures with commonly used metagenomic sequence analysis pipelines6,9,10 on artificial (mock) communities of known composition have <97% identity with true biological sequences, a divergence generally considered sufficient to infer a new species8, and the number of OTUs often far exceeds the number of expected species.
I have developed a pipeline (UPARSE, http://drive5.com/uparse/
Web End =http://drive5.com/uparse/ and Supplementary Software) for constructing OTUs de novo from next-generation reads that achieves high accuracy in biological sequence recovery and improves richness estimates on mock communities. UPARSE works by quality-filtering reads, trimming them to a fixed length, optionally discarding singleton reads and then clustering the remaining reads. Clustering uses UPARSE-OTU, a novel greedy algorithm that performs chimera filtering and OTU clustering simultaneouslyunlike previously
Independent Investigator, Tiburon, California, USA. Correspondence should be addressed to R.C.E. ([email protected]). RECEIVED 7 DECEMBER 2012; ACCEPTED 15 JULY 2013; PUBLISHED ONLINE 18 AUGUST 2013; http://www.nature.com/doifinder/10.1038/nmeth.2604
Web End =DOI:10.1038/NMETH.2604
developed...