Abstract

Abstract

Plant metabolites produced via diverse pathways are important for plant survival, human nutrition and medicine. However, the pathway memberships of most plant enzyme genes are unknown. While co-expression is useful for assigning genes to pathways, expression correlation may exist only under specific spatiotemporal and conditional contexts. Utilizing >600 expression values and similarity data combinations from tomato, three strategies for predicting membership in 85 pathways were explored: naive prediction (identifying pathways with the most similarly expressed genes), unsupervised and supervised learning. Optimal predictions for different pathways require distinct data combinations that, in some cases, are indicative of biological processes relevant to pathway functions. Naive prediction produced higher error rates compared with machine learning methods. In 52 pathways, unsupervised learning performed better than a supervised approach, which may be due to the limited availability of training data. Furthermore, using gene-to-pathway expression similarities led to prediction models that outperformed those based simply on gene expression levels. Our study highlights the need to extensively explore expression-based features and prediction strategies to maximize the accuracy of metabolic pathway membership assignment. We anticipate that the prediction framework outlined here can be applied to other species and also be used to improve plant pathway annotation.

Competing Interest Statement

The authors have declared no competing interest.

Details

Title
Optimizing the use of gene expression data to predict plant metabolic pathway memberships
Author
Wang, Peipei; Moore, Bethany M; Uygun, Sahra; Lehti-Shiu, Melissa D; Barry, Cornelius S; Shin-Han, Shiu
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2020
Publication date
Oct 7, 2020
Publisher
Cold Spring Harbor Laboratory Press
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
ProQuest document ID
2424261557
Copyright
© 2020. This article is published under http://creativecommons.org/licenses/by-nd/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.