Abstract

Given that influenza vaccine effectiveness depends on a good antigenic match between the vaccine and circulating viruses, it is important to assess the antigenic properties of newly emerging variants continuously. With the increasing application of real-time pathogen genomic surveillance, a key question is if antigenic properties can reliably be predicted from influenza virus genomic information. Based on validated linked datasets of influenza virus genomic and wet lab experimental results, in silico models may be of use to learn to predict immune escape of variants of interest starting from the protein sequence only. In this study, we compared several machine-learning methods to reconstruct antigenic map coordinates for HA1 protein sequences of influenza A(H3N2) virus, to rank substitutions responsible for major antigenic changes, and to recognize variants with novel antigenic properties that may warrant future vaccine updates. Methods based on deep learning language models (BiLSTM and ProtBERT) were shown to outperform more classical approaches that involved predictions based solely on genetic distances and physicochemical properties of amino acid sequences, particularly for fine-grained features like single amino acid-driven antigenic change and in silico deep mutational scanning experiments to rank the substitutions with the largest impact on antigenic properties. Given that the best performing model that produces protein embeddings is agnostic to the specific pathogen, the presented approach may be applicable to other pathogens.

Competing Interest Statement

The authors have declared no competing interest.

Details

Title
Language models learn to represent antigenic properties of human influenza A(H3) virus
Author
Durazzi, Francesco; Koopmans, Marion; Ron Am Fouchier; Remondini, Daniel
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2025
Publication date
Jan 18, 2025
Publisher
Cold Spring Harbor Laboratory Press
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
ProQuest document ID
3156840175
Copyright
© 2025. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.