It appears you don't have support to open PDFs in this web browser. To view this file, Open with your PDF reader
Abstract
Engineering stabilized proteins is a fundamental challenge in the development of industrial and pharmaceutical biotechnologies. We present Stability Oracle: a structure-based graph-transformer framework that achieves SOTA performance on accurately identifying thermodynamically stabilizing mutations. Our framework introduces several innovations to overcome well-known challenges in data scarcity and bias, generalization, and computation time, such as: Thermodynamic Permutations for data augmentation, structural amino acid embeddings to model a mutation with a single structure, a protein structure-specific attention-bias mechanism that makes transformers a viable alternative to graph neural networks. We provide training/test splits that mitigate data leakage and ensure proper model evaluation. Furthermore, to examine our data engineering contributions, we fine-tune ESM2 representations (Prostata-IFML) and achieve SOTA for sequence-based models. Notably, Stability Oracle outperforms Prostata-IFML even though it was pretrained on 2000X less proteins and has 548X less parameters. Our framework establishes a path for fine-tuning structure-based transformers to virtually any phenotype, a necessary task for accelerating the development of protein-based biotechnologies.
Engineering stabilized proteins is essential for industrial and pharmaceutical biotechnologies. Here, authors present Stability Oracle, a Graph-Transformer framework trained on protein masked microenvironments to predict protein thermodynamic stability, using less training data while achieving improved generalization.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details




1 Department of Computer Science, UT Austin, Austin, USA (GRID:grid.89336.37) (ISNI:0000 0004 1936 9924); LLC, Intelligent Proteins, Austin, USA (GRID:grid.89336.37); Department of Chemistry, UT Austin, Austin, USA (GRID:grid.89336.37) (ISNI:0000 0004 1936 9924)
2 Department of Computer Science, UT Austin, Austin, USA (GRID:grid.89336.37) (ISNI:0000 0004 1936 9924)
3 LLC, Intelligent Proteins, Austin, USA (GRID:grid.89336.37); Department of Molecular Biosciences, UT Austin, Austin, USA (GRID:grid.89336.37) (ISNI:0000 0004 1936 9924)
4 McKetta Department of Chemical Engineering, UT Austin, Austin, USA (GRID:grid.89336.37) (ISNI:0000 0004 1936 9924)
5 Department of Molecular Biosciences, UT Austin, Austin, USA (GRID:grid.89336.37) (ISNI:0000 0004 1936 9924)
6 Chandra Family Department of Electrical and Computer Engineering, UT Austin, Austin, USA (GRID:grid.89336.37) (ISNI:0000 0004 1936 9924)