Abstract

RNA-Seq is a powerful technique to provide quantitative information on gene expression. While many applications focus on estimated expression levels, it is also important to determine which genes are actively transcribed, and which are not. The problem can be viewed as simply setting a biologically meaningful threshold for calling a gene expressed. We propose to define this threshold per sample relative to the background level for non-expressed genomic features, inferred by the amount of reads mapped to intergenic regions of the genome. To this aim, we first define a stringent set of reference intergenic regions, based on available bulk RNA-Seq libraries for each species. We provide predefined regions selected for different animal species with varying genome annotation quality through the Bgee database. We then call genes expressed if their level of expression is significantly higher than the background noise. This approach can be applied to bulk as well as single-cell RNA-Seq, on a single library as well as on a combination of libraries over one condition. We show that the estimated proportion of expressed genes is biologically meaningful and stable between libraries originating from the same tissue, in both model and non-model organisms.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

* https://github.com/BgeeDB/Methods_RNASeq_expression_calls

* https://github.com/BgeeDB/BgeeCall/tree/calls_paper

Details

Title
Robust inference of expression state in bulk and single-cell RNA-Seq using curated intergenic regions
Author
Fonseca Costa, Sara S; Rosikiewicz, Marta; Roux, Julien; Wollbrett, Julien; Bastian, Frederic B; Robinson-Rechavi, Marc
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2022
Publication date
Apr 1, 2022
Publisher
Cold Spring Harbor Laboratory Press
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
ProQuest document ID
2646018623
Copyright
© 2022. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.