Abstract

In recent years, generative protein sequence models have been developed to sample novel sequences. However, predicting whether generated proteins will fold and function remains challenging. We evaluate computational metrics to assess the quality of enzyme sequences produced by three contrasting generative models: ancestral sequence reconstruction, a generative adversarial network, and a protein language model. Focusing on two enzyme families, we expressed and purified over 440 natural and generated sequences with 70-90% identity to the most similar natural sequences to benchmark computational metrics for predicting in vitro enzyme activity. Over three rounds of experiments, we developed a computational filter that improved experimental success rates by 44-100%. Surprisingly, neither sequence identity to natural sequences nor AlphaFold2 residue-confidence scores were predictive of enzyme activity. The proposed metrics and models will drive protein engineering research by serving as a benchmark for generative protein sequence models and helping to select active variants to test experimentally.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

* https://doi.org/10.5281/zenodo.7688668

* https://github.com/seanrjohnson/protein_scoring

* https://github.com/seanrjohnson/protein_gibbs_sampler

Details

Title
Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks
Author
Johnson, Sean R; Fu, Xiaozhi; Viknander, Sandra; Goldin, Clara; Monaco, Sarah; Zelezniak, Aleksej; Yang, Kevin K
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2023
Publication date
Mar 4, 2023
Publisher
Cold Spring Harbor Laboratory Press
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
ProQuest document ID
2782551947
Copyright
© 2023. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.