FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking

Abstract

Fusion oncoproteins, a class of chimeric proteins arising from chromosomal translocations, are major drivers of various pediatric cancers. These proteins are intrinsically disordered and lack druggable pockets, making them highly challenging therapeutic targets for both small molecule-based and structure-based approaches. Protein language models (pLMs) have recently emerged as powerful tools for capturing physicochemical and functional protein features but have yet to be trained on fusion oncoprotein sequences. We introduce FusOn-pLM, a fine-tuned pLM trained on a newly curated, comprehensive set of fusion oncoprotein sequences, FusOn-DB. Employing a unique cosine-scheduled masked language modeling strategy, FusOn-pLM dynamically adjusts masking rates (15%–40%) to optimize feature extraction and representation quality, surpassing baseline embeddings in fusion-specific tasks, including localization, puncta formation, and disorder prediction. FusOn-pLM uniquely predicts drug-resistant mutations, providing insights for therapeutic design that anticipates resistance mechanisms. In total, FusOn-pLM provides biologically relevant representations for advancing therapeutic discovery in fusion-driven cancers.

Fusion oncoproteins drive paediatric cancers but are challenging to target due to their intrinsic disorder and lack of druggable pockets. Here, authors present FusOn-pLM, trained on FusOn-DB, which uses dynamic masking to outperform baselines in fusion-specific tasks and predict drug-resistant mutations, advancing therapeutic design.

Details

Subject

Cancer;
Proteins;
Task scheduling;
Oncoproteins;
Fusion protein;
Mutation;
Language;
Therapeutic targets;
Drug resistance;
Masking;
Protein structure;
Localization;
Chromosome translocations;
Drug development;
Pediatrics;
Representations

Title

FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking

Publication title

Nature Communications; London

Volume

Issue

Pages

1436

Publication year

2025

Publication date

2025

Publisher

Nature Publishing Group

Place of publication

London

Country of publication

United States

Publication subject

Sciences: Comprehensive Works

e-ISSN

20411723

Source type

Scholarly Journal

Language of publication

English

Document type

Journal Article

Publication history

Online publication date

2025-02-07

Milestone dates

2025-01-30 (Registration); 2024-06-03 (Received); 2025-01-24 (Accepted)

Publication history

First posting date

07 Feb 2025

DOI

https://doi.org/10.1038/s41467-025-56745-6

ProQuest document ID

3164511662

Document URL

https://www.proquest.com/scholarly-journals/fuson-plm-fusion-oncoprotein-specific-language/docview/3164511662/se-2?accountid=208611

Last updated

2025-07-27

Database

ProQuest One Academic

FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking

Content area

Abstract

Details