Benchmarking of cell type deconvolution pipelines

Abstract

Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance.

Inferring cell type proportions from transcriptomics data is affected by data transformation, normalization, choice of method and the markers used. Here, the authors use single-cell RNAseq datasets to evaluate the impact of these factors and propose guidelines to maximise deconvolution performance.

Details

Title

Benchmarking of cell type deconvolution pipelines for transcriptomics data

Author

Avila Cobos Francisco¹

; Alquicira-Hernandez, José²

; Powell, Joseph E²

; Mestdagh Pieter³

; De Preter Katleen³

¹ Ghent University, Center for Medical Genetics Ghent, Department of Biomolecular Medicine, Ghent, Belgium (GRID:grid.5342.0) (ISNI:0000 0001 2069 7798); Cancer Research Institute Ghent (CRIG), Ghent, Belgium (GRID:grid.5342.0); Garvan Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Sydney, Australia (GRID:grid.415306.5) (ISNI:0000 0000 9983 6924)
² Garvan Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Sydney, Australia (GRID:grid.415306.5) (ISNI:0000 0000 9983 6924); Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia (GRID:grid.1003.2) (ISNI:0000 0000 9320 7537)
³ Ghent University, Center for Medical Genetics Ghent, Department of Biomolecular Medicine, Ghent, Belgium (GRID:grid.5342.0) (ISNI:0000 0001 2069 7798); Cancer Research Institute Ghent (CRIG), Ghent, Belgium (GRID:grid.5342.0)

Publication year

2020

Publication date

2020

Publisher

Nature Publishing Group

e-ISSN

20411723

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1038/s41467-020-19015-1

ProQuest document ID

2471540699

© The Author(s) 2020. corrected publication 2020. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Benchmarking of cell type deconvolution pipelines for transcriptomics data

Jump to:

Abstract

Details

Suggested sources