Abstract

Structural disorder is widespread in eukaryotic proteins and is vital for their function in diverse biological processes. It is therefore highly desirable to be able to predict the degree of order and disorder from amino acid sequence. It is, however, notoriously difficult to predict the degree of local flexibility within structured domains and the presence and nuances of localized rigidity within intrinsically disordered regions. To identify such instances, we used the CheZOD database, which encompasses accurate, balanced, and continuous-valued quantification of protein (dis)order at amino acid resolution based on NMR chemical shifts. To computationally forecast the spectrum of protein disorder in the most comprehensive manner possible, we constructed the sequence-based protein order/disorder predictor ODiNPred, trained on an expanded version of CheZOD. ODiNPred applies a deep neural network comprising 157 unique sequence features to 1325 protein sequences together with the experimental NMR chemical shift data. Cross-validation for 117 protein sequences shows that ODiNPred better predicts the continuous variation in order along the protein sequence, suggesting that contemporary predictors are limited by the quality of training data. The inclusion of evolutionary features reduces the performance gap between ODiNPred and its peers, but analysis shows that it retains greater accuracy for the more challenging prediction of intermediate disorder.

Details

Title
ODiNPred: comprehensive prediction of protein order and disorder
Author
Dass Rupashree 1 ; Mulder Frans A A 2 ; Nielsen, Jakob Toudahl 2 

 Aarhus University, Interdisciplinary Nanoscience Center (iNANO), Aarhus C, Denmark (GRID:grid.7048.b) (ISNI:0000 0001 1956 2722) 
 Aarhus University, Interdisciplinary Nanoscience Center (iNANO), Aarhus C, Denmark (GRID:grid.7048.b) (ISNI:0000 0001 1956 2722); Aarhus University, Department of Chemistry, Aarhus C, Denmark (GRID:grid.7048.b) (ISNI:0000 0001 1956 2722) 
Publication year
2020
Publication date
2020
Publisher
Nature Publishing Group
e-ISSN
20452322
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
1888631210
Copyright
© The Author(s) 2020. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.