Abstract

Abstract

Single-cell RNA-seq technologies have been successfully employed over the past decade to generate many high resolution cell atlases. These have proved invaluable in recent efforts aimed at understanding the cell type specificity of host genes involved in SARS-CoV-2 infections. While single-cell atlases are based on well-sampled highly-expressed genes, many of the genes of interest for understanding SARS-CoV-2 can be expressed at very low levels. Common assumptions underlying standard single-cell analyses don’t hold when examining low-expressed genes, with the result that standard workflows can produce misleading results.

Key Points

* Lowly expressed genes in single-cell RNA-seq can be easliy misanalyzed.

* log(1+x) count normalization introduces errors for lowly expressed genes

* The average log(1+x) expression differs considerably from log(x) when x is small

* An alternative approach is to use the fraction of cells with non-zero expression

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

* Incorporated feedback from six reviewers after a rejection from Briefings in Bioinformatics.

* https://github.com/pachterlab/BP_2020_2

Details

Title
Normalization of single-cell RNA-seq counts by log(x+1)* or log(1+x)*
Author
A Sina Booeshaghi; Pachter, Lior
University/institution
Cold Spring Harbor Laboratory Press
Section
Contradictory Results
Publication year
2020
Publication date
Oct 14, 2020
Publisher
Cold Spring Harbor Laboratory Press
Source type
Working Paper
Language of publication
English
ProQuest document ID
2405773820
Copyright
© 2020. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.