Content area
Full Text
N E W S A N D V I E W S
How deep is enough in single-cell RNA-seq?
Aaron M Streets & Yanyi Huang
Guidelines for determining sequencing depth facilitate transcriptome profiling of single cells in heterogeneous populations.
npg 201 4 Nature America, Inc. All rights reserved.
In recent years, single-cell RNA-seq has emerged as a powerful, new approach for characterizing the cell types present in a mixed population. These studies usually involve a trade-off between the number of samples analyzed and the number of RNA transcripts sequenced per cell, or sequencing depth, that can be achieved. In this issue, Pollen et al.1 present quantitative guidelines for determining the sequencing depth necessary to distinguish the cell types in a complex sample. Using a commercial microfluidic platform to capture hundreds of cells from a variety of human tissues and performing RNA-seq to different depths, they demonstrate accurate and reliable classification of cell types at a sequencing depth of only 50,000 reads per cell (Fig. 1a)about two orders of magnitude fewer than what has been typically reported.
Identification of cell types in mixed populations has long been done using known bio-markers analyzed by fluorescence-activated cell sorting or multiplexed, quantitative PCR. In contrast, the complete transcriptional profiles generated by single-cell RNA-seq allow cells to be identified objectively without a priori knowledge of biomarkers. This approach also enables the discovery of novel biomarkers.
In a typical single-cell RNA-seq experiment, tens to hundreds or even thousands of single cells are isolated from a tissue or culture, and the transcriptome of each cell is reverse transcribed into cDNA. The cDNA is then amplified and further processed for next-generation sequencing. The output of this pipeline is a list of sequence fragments called reads. Mapping of the reads to the reference genome produces estimates of normalized gene expression levels in an N L matrix, where N is the number of
cells and L is the total number of genes identified among all the cells (Fig. 1b).
Statistical analysis is then used to find trends in gene expression across many single cells. A common technique is unsupervised hierarchical clustering, which takes the N L gene expression matrix and re-orders the row and column indices to minimize the difference between expression levels of adjacent...