Content area
Full Text
REVIEWS
S I N G L E - C E L L O M I C S
Computational and analytical challenges in single-cell transcriptomics
Oliver Stegle1, Sarah A.Teichmann1,2 and John C.Marioni1,2
Abstract | The development of high-throughput RNA sequencing (RNA-seq) at the single-cell level has already led to profound new discoveries in biology, ranging from the identification of novel cell types to the study of global patterns of stochastic gene expression. Alongside the technological breakthroughs that have facilitated the large-scale generation of single-cell transcriptomic data, it is important to consider the specific computational and analytical challenges that still have to be overcome. Although some tools for analysing RNA-seq data from bulk cell populations can be readily applied to single-cell RNA-seq data, many new computational strategies are required to fully exploit this data type and to enable a comprehensive yet detailed study of gene expression at the single-cell level.
Cell identity and function can be characterized at the molecular level by unique transcriptomic signatures1. At the organismal level, different tissues have distinct gene expression profiles2,3, and even cells in consecutive stages of embryonic development have highly divergent transcriptomic landscapes4. Consequently, mutations that alter these expression profiles have been associated with adverse phenotypes ranging from a delayed immune response5 to disease6.
Until recently, molecular fingerprints were generated using profiling of gene expression levels from bulk populations of millions of input cells7. These ensemble-based approaches, whether performed using microarrays8 or the next-generation sequencing (NGS) approach of high-throughput RNA sequencing (RNA-seq)911, meant that the resulting expression value for each gene was an average of its expression levels across a large population of input cells. In many contexts, such bulk expression profiles are sufficient. For example, in comparative transcriptomics, the goal is to study the selection pressures that apply to gene expression levels between samples of the same tissue taken from different species. In this context, a global view of average gene expression levels in each tissue, which can be obtained from bulk RNA-seq, may be sufficient2,12. Similarly, gene expression signatures obtained using ensemble approaches have yielded biomarkers that are predictive for disease status and clinical progression13.
However, there are also important biological questions for which bulk measures of gene expression are insufficient14. For instance, during early development, there are...