Abstract

Automated protein function prediction is critical for the annotation of uncharacterized protein sequences, where accurate prediction methods are still required. Recently, deep learning based methods have outperformed conventional algorithms in computer vision and natural language processing due to the prevention of overfitting and efficient training. Here, we propose DEEPred, a hierarchical stack of multi-task feed-forward deep neural networks, as a solution to Gene Ontology (GO) based protein function prediction. DEEPred was optimized through rigorous hyper-parameter tests, and benchmarked using three types of protein descriptors, training datasets with varying sizes and GO terms form different levels. Furthermore, in order to explore how training with larger but potentially noisy data would change the performance, electronically made GO annotations were also included in the training process. The overall predictive performance of DEEPred was assessed using CAFA2 and CAFA3 challenge datasets, in comparison with the state-of-the-art protein function prediction methods. Finally, we evaluated selected novel annotations produced by DEEPred with a literature-based case study considering the ‘biofilm formation process’ in Pseudomonas aeruginosa. This study reports that deep learning algorithms have significant potential in protein function prediction; particularly when the source data is large. The neural network architecture of DEEPred can also be applied to the prediction of the other types of ontological associations. The source code and all datasets used in this study are available at: https://github.com/cansyl/DEEPred.

Details

Title
DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks
Author
Sureyya Rifaioglu Ahmet 1   VIAFID ORCID Logo  ; Tunca, Doğan 2 ; Jesus Martin Maria 3 ; Cetin-Atalay Rengul 4   VIAFID ORCID Logo  ; Atalay Volkan 5 

 Department of Computer Engineering, METU, Ankara, Turkey (GRID:grid.6935.9) (ISNI:0000 0001 1881 7391); İskenderun Technical University, Department of Computer Engineering, Hatay, Turkey (GRID:grid.503005.3) (ISNI:0000 0004 5896 2288) 
 European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, UK (GRID:grid.225360.0) (ISNI:0000 0000 9709 7726); Graduate School of Informatics, METU, KanSiL, Department of Health Informatics, Ankara, Turkey (GRID:grid.6935.9) (ISNI:0000 0001 1881 7391) 
 European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, UK (GRID:grid.225360.0) (ISNI:0000 0000 9709 7726) 
 Graduate School of Informatics, METU, KanSiL, Department of Health Informatics, Ankara, Turkey (GRID:grid.6935.9) (ISNI:0000 0001 1881 7391) 
 Department of Computer Engineering, METU, Ankara, Turkey (GRID:grid.6935.9) (ISNI:0000 0001 1881 7391); Graduate School of Informatics, METU, KanSiL, Department of Health Informatics, Ankara, Turkey (GRID:grid.6935.9) (ISNI:0000 0001 1881 7391) 
Publication year
2019
Publication date
2019
Publisher
Nature Publishing Group
e-ISSN
20452322
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2225123500
Copyright
© The Author(s) 2019. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.