Abstract

Machine learning could enable an unprecedented level of control in protein engineering for therapeutic and industrial applications. Critical to its use in designing proteins with desired properties, machine learning models must capture the protein sequence-function relationship, often termed fitness landscape. Existing benchmarks like CASP or CAFA assess structure and function predictions of proteins, respectively, yet they do not target metrics relevant for protein engineering. In this work, we introduce Fitness Landscape Inference for Proteins (FLIP), a benchmark for function prediction to encourage rapid scoring of representation learning for protein engineering. Our curated tasks, baselines, and metrics probe model generalization in settings relevant for protein engineering, e.g. low-resource and extrapolative. Currently, FLIP encompasses experimental data across adeno-associated virus stability for gene therapy, protein domain B1 stability and immunoglobulin binding, and thermostability from multiple protein families. In order to enable ease of use and future expansion to new tasks, all data are presented in a standard format. FLIP scripts and data are freely accessible at https://benchmark.protein.properties/home.

Competing Interest Statement

KKY was previously employed by Generate Biomedicines.

Footnotes

* Glossary and experimental replicates in supplemental.

* https://benchmark.protein.properties/home

* https://github.com/J-SNACKKB/FLIP

Details

Title
FLIP: Benchmark tasks in fitness landscape inference for proteins
Author
Dallago, Christian; Mou, Jody; Johnston, Kadina E; Wittmann, Bruce; Bhattacharya, Nicholas; Samuel Lucas Goldman; Madani, Ali; Yang, Kevin K
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2022
Publication date
Jan 19, 2022
Publisher
Cold Spring Harbor Laboratory Press
Source type
Working Paper
Language of publication
English
ProQuest document ID
2621097282
Copyright
© 2022. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.