Abstract

Protein therapeutic design and property prediction are frequently hampered by data scarcity. Here we propose a new model, DyAb, that addresses these issues by leveraging a pair-wise representation to predict differences in protein properties, rather than absolute values. DyAb is built on top of a pre-trained protein language model and achieves a Spearman rank correlation of up to 0.85 on binding affinity prediction across molecules targeting three different antigens (EGFR, IL-6, and an internal target), given as few as 100 training data. We employ DyAb in two design contexts: as a ranking model to score combinations of known mutations, and combined with a genetic algorithm to generate new sequences. Our method consistently generates novel antibody candidates with high binding rates, including designs that improve on the binding affinity of the lead molecule by more than ten-fold. DyAb represents a powerful tool for engineering therapeutic protein properties in low data regimes common in early-stage drug development.

Competing Interest Statement

The authors have declared no competing interest.

Details

Title
DyAb: sequence-based antibody design and property prediction in a low-data regime
Author
Joshua Yao-Yu Lin; Hofmann, Jennifer L; Leaver-Fay, Andrew; Wei-Ching, Liang; Vasilaki, Stefania; Lee, Edith; Pinheiro, Pedro O; Tagasovska, Natasa; Kiefer, James R; Wu, Yan; Seeger, Franziska; Bonneau, Richard; Gligorijevic, Vladimir; Watkins, Andrew; Cho, Kyunghyun; Frey, Nathan C
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2025
Publication date
Feb 2, 2025
Publisher
Cold Spring Harbor Laboratory Press
Source type
Working Paper
Language of publication
English
ProQuest document ID
3162651455
Copyright
© 2025. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.