Abstract

The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a diffusion-based generative model that generates protein backbone structures via a procedure inspired by the natural folding process. We describe a protein backbone structure as a sequence of angles capturing the relative orientation of the constituent backbone atoms, and generate structures by denoising from a random, unfolded state towards a stable folded structure. Not only does this mirror how proteins natively twist into energetically favorable conformations, the inherent shift and rotational invariance of this representation crucially alleviates the need for more complex equivariant networks. We train a denoising diffusion probabilistic model with a simple transformer backbone and demonstrate that our resulting model unconditionally generates highly realistic protein structures with complexity and structural patterns akin to those of naturally-occurring proteins. As a useful resource, we release an open-source codebase and trained models for protein structure diffusion.

The ability to engineer novel protein structures has tremendous scientific and therapeutic impact. Here, authors develop a generative model acting upon an angular representation of protein structures to create high quality protein backbones.

Details

Title
Protein structure generation via folding diffusion
Author
Wu, Kevin E. 1 ; Yang, Kevin K. 2 ; van den Berg, Rianne 3 ; Alamdari, Sarah 2 ; Zou, James Y. 4   VIAFID ORCID Logo  ; Lu, Alex X. 2 ; Amini, Ava P. 2   VIAFID ORCID Logo 

 Stanford University, Department of Computer Science, Stanford, USA (GRID:grid.168010.e) (ISNI:0000 0004 1936 8956); Stanford University, Center for Personal Dynamic Regulomes, Stanford, USA (GRID:grid.168010.e) (ISNI:0000 0004 1936 8956); Stanford University School of Medicine, Department of Biomedical Data Science, Stanford, USA (GRID:grid.168010.e) (ISNI:0000000419368956) 
 Microsoft Research, Cambridge, USA (GRID:grid.24488.32) (ISNI:0000 0004 0503 404X) 
 Microsoft Research, Amsterdam, Netherlands (GRID:grid.24488.32) 
 Stanford University, Department of Computer Science, Stanford, USA (GRID:grid.168010.e) (ISNI:0000 0004 1936 8956); Stanford University School of Medicine, Department of Biomedical Data Science, Stanford, USA (GRID:grid.168010.e) (ISNI:0000000419368956) 
Pages
1059
Publication year
2024
Publication date
2024
Publisher
Nature Publishing Group
e-ISSN
20411723
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2922281766
Copyright
© The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.