Full Text

Turn on search term navigation

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Protein structure prediction continues to pose multiple challenges despite outstanding progress that is largely attributable to the use of novel machine learning techniques. One of the widely used representations of local 3D structure—protein blocks (PBs)—can be treated in a similar way to secondary structure classes. Here, we present a new approach for predicting local conformation in terms of PB classes solely from amino acid sequences. We apply the RMSD metric to ensure unambiguous future 3D protein structure recovery. The selection of statistically assessed features is a key component of the proposed method. We suggest that ML input features should be created from the statistically significant predictors that are derived from the amino acids’ physicochemical properties and the resolved structures’ statistics. The statistical significance of the suggested features was assessed using a stepwise regression analysis that permitted the evaluation of the contribution and statistical significance of each predictor. We used the set of 380 statistically significant predictors as a learning model for the regression neural network that was trained using the PISCES30 dataset. When using the same dataset and metrics for benchmarking, our method outperformed all other methods reported in the literature for the CB513 nonredundant dataset (for the PBs, Q16 = 81.01%, and for the DSSP, Q3 = 85.99% and Q8 = 79.35%).

Details

Title
Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information
Author
Milchevskiy, Yury V 1 ; Milchevskaya, Vladislava Y 2 ; Nikitin, Alexei M 1 ; Kravatsky, Yury V 3   VIAFID ORCID Logo 

 Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia[email protected] (Y.V.K.) 
 Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia[email protected] (Y.V.K.); Institute of Medical Statistics and Bioinformatics, University of Cologne, 50931 Cologne, Germany 
 Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia[email protected] (Y.V.K.); Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia 
First page
15656
Publication year
2023
Publication date
2023
Publisher
MDPI AG
ISSN
16616596
e-ISSN
14220067
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2888182389
Copyright
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.