Abstract

Background: Profiling of mRNA expression is an important method to identify biomarkers but complicated by limited correlations between mRNA expression and protein abundance. We hypothesised that these correlations could be improved by mathematical models based on measuring splice variants and time delay in protein translation. Methods: We characterised time-series of primary human naive CD4+ T cells during early T-helper type 1 differentiation with RNA-sequencing and mass-spectrometry proteomics. We then performed computational time-series analysis in this system and in two other key human and murine immune cell types. Linear mathematical mixed time-delayed splice variant models were used to predict protein abundances, and the models were validated using out-of-sample predictions. Lastly, we re-analysed RNA-Seq datasets to evaluate biomarker discovery in five T-cell associated diseases, validating the findings for multiple sclerosis (MS) and asthma. Results: The new models demonstrated median correlations of mRNA-to-protein abundance of 0.79-0.94, significantly out-performing models not including the usage of multiple splice variants and time-delays, as shown in cross-validation tests. Our mathematical models provided more differentially expressed proteins between patients and controls in all five diseases. Moreover, analysis of these proteins in asthma and MS supported their relevance. One marker, sCD27, was clinically validated in MS using two independent cohorts, for treatment response and prognosis. Conclusion: Our splice variant and time-delay models substantially improved the prediction of protein abundance from mRNA data in three immune cell-types. The models provided valuable biomarker candidates, which were validated in clinical studies of MS and asthma. We propose that our strategy is generally applicable for biomarker discovery.

Footnotes

* Mainly the format has been changed from a Nature letter type to that of a traditional research article. As a result, the paper has been significantly lengthened give room for additional background, extra figures and discussion.

Details

Title
A validated strategy to infer protein biomarkers from RNA-Seq by combining multiple mRNA splice variants and time-delay
Author
Magnusson, Rasmus; Rundquist, Olof; Min Jung Kim; Hellberg, Sandra; Chan, Hyun Na; Benson, Mikael; Gomez-Cabrero, David; Ingrid Skelton Kockum; Tegner, Jesper; Piehl, Fredrik; Jagodic, Maja; Mellergard, Johan; Altafini, Claudio; Ernerudh, Jan; Jenmalm, Maria C; Nestor, Colm E; Min-Sik, Kim; Gustafsson, Mika
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2020
Publication date
Feb 21, 2020
Publisher
Cold Spring Harbor Laboratory Press
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
ProQuest document ID
2202945966
Copyright
© 2020. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.