Abstract

As the number of single‐cell transcriptomics datasets grows, the natural next step is to integrate the accumulating data to achieve a common ontology of cell types and states. However, it is not straightforward to compare gene expression levels across datasets and to automatically assign cell type labels in a new dataset based on existing annotations. In this manuscript, we demonstrate that our previously developed method, scVI, provides an effective and fully probabilistic approach for joint representation and analysis of scRNA‐seq data, while accounting for uncertainty caused by biological and measurement noise. We also introduce single‐cell ANnotation using Variational Inference (scANVI), a semi‐supervised variant of scVI designed to leverage existing cell state annotations. We demonstrate that scVI and scANVI compare favorably to state‐of‐the‐art methods for data integration and cell state annotation in terms of accuracy, scalability, and adaptability to challenging settings. In contrast to existing methods, scVI and scANVI integrate multiple datasets with a single generative model that can be directly used for downstream tasks, such as differential expression. Both methods are easily accessible through scvi‐tools.

Details

Title
Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models
Author
Xu, Chenling 1   VIAFID ORCID Logo  ; Lopez, Romain 2   VIAFID ORCID Logo  ; Mehlman, Edouard 3   VIAFID ORCID Logo  ; Regier, Jeffrey 4   VIAFID ORCID Logo  ; Jordan, Michael I 5   VIAFID ORCID Logo  ; Nir Yosef 6   VIAFID ORCID Logo 

 Center for Computational Biology, University of California, Berkeley, CA, USA 
 Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA 
 Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA; Centre de Mathématiques Appliquées École polytechnique, Palaiseau, France 
 Department of Statistics, University of Michigan, Ann Arbor, MI, USA 
 Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA; Department of Statistics, University of California, Berkeley, CA, USA 
 Center for Computational Biology, University of California, Berkeley, CA, USA; Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA; Ragon Institute of MGH, MIT and Harvard, Boston, MA, USA; Chan‐Zuckerberg Biohub Investigator, San Francisco, CA, USA 
Section
Articles
Publication year
2021
Publication date
Jan 2021
Publisher
EMBO Press
e-ISSN
17444292
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2482473408
Copyright
© 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.