Content area

Abstract

While generative models hold immense promise for protein design, existing models are typically backbone-only, despite the indispensable role that sidechain atoms play in mediating function. As prerequisite knowledge, all-atom 3D structure generation require the discrete sequence to specify sidechain identities, which poses a multimodal generation problem. We propose PLAID (Protein Latent Induced Diffusion), which samples from the latent space of a pre-trained sequence-tostructure predictor, ESMFold. The sampled latent embedding is then decoded with frozen decoders into the sequence and all-atom structure. Importantly, PLAID only requires sequence input during training, thus augmenting the dataset size by 2-4 orders of magnitude compared to the Protein Data Bank. It also makes more annotations available for functional control. As a demonstration of annotation-based prompting, we perform compositional conditioning on function and taxonomy using classifier-free guidance. Intriguingly, function-conditioned generations learn active site residue identities, despite them being non-adjacent on the sequence, and can correctly place the sidechains atoms. We further show that PLAID can generate transmembrane proteins with expected hydrophobicity patterns, perform motif scaffolding, and improve unconditional sample quality for long sequences. Links to model weights and training code are publicly available at github.com/amyxlu/plaid.

Competing Interest Statement

AXL, SAR, SK, VG, KC, RB, and NCF are employees of Genentech Inc., a member of the Roche Group.

Footnotes

* Title and formatting update: evaluation figures updated, motif scaffolding figure moved to main text, various other changes to content ordering

Details

1009240
Title
All-Atom Protein Generation with Latent Diffusion
Publication title
bioRxiv; Cold Spring Harbor
Publication year
2025
Publication date
Feb 13, 2025
Section
New Results
Publisher
Cold Spring Harbor Laboratory Press
Source
BioRxiv
Place of publication
Cold Spring Harbor
Country of publication
United States
University/institution
Cold Spring Harbor Laboratory Press
Publication subject
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
Document type
Working Paper
Publication history
 
 
Milestone dates
2024-12-05 (Version 1)
ProQuest document ID
3141237419
Document URL
https://www.proquest.com/working-papers/all-atom-protein-generation-with-latent-diffusion/docview/3141237419/se-2?accountid=208611
Copyright
© 2025. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-02-14
Database
ProQuest One Academic