Content area

Abstract

Background

Despite advancements in AI for diagnosing neurodegenerative diseases (NDs) like Alzheimer's (AD), existing models are task‐specific and lack diagnostic reasoning, limiting their ability to interpret complex clinical and imaging data. Leveraging recent progress in multimodal foundation models (FMs) for healthcare, we developed Brain‐FM, a FM‐powered visual question answering (VQA) system. Brain‐FM enhances brain health diagnostics using image‐text data from scientific literature and is adaptable to diverse healthcare tasks and subpopulations.

Method

We constructed PMC‐Dementia dataset, a curated dataset of image‐caption pairs, that were sourced from 4.5 million scientific publications, focusing on ND subtypes. To study the FM's ability in interpreting the visual inputs, we employed two visual encoders, CLIP (Radford et al., 2021) and DINOv2 (Oquab et al., 2024) and integrated their features to construct our Brain‐FM. Next, we randomly selected magnetic resonance imaging (MRI) images of 200 participants from the AIBL (Australian Imaging Biomarkers and Lifestyle Study of Ageing) dataset to generate a VQA dataset. We then evaluated the visual reasoning capabilities of our Brain‐FM in identifying the correct MRI plane, sequence (such as T1, T2‐weighted and Fluid Attenuated Inversion Recovery, FLAIR) and diagnostic reasoning.

Result

We found that different visual encoders had varying visual reasoning performance when tested on our curated AIBL dataset (Table 1). The integration of DINOv2 with CLIP enhanced the Brain‐FM's visual interpretation and instruction‐following capabilities with improved accuracy and precision. Figure 1 illustrates Brain‐FM's ability to follow our instruction and answer our visual question accurately when using DINOv2 with CLIP compared to CLIP alone.

Conclusion

This study introduces Brain‐FM, a multimodal foundation model tailored for brain health applications, leveraging visual‐text data to enhance diagnostic reasoning. Experimental results show that mixing the visual features achieves high accuracy when finetuned on PMC‐Dementia dataset. Brain‐FM represents a significant step toward more flexible, generalizable AI systems for brain health diagnostics and reasoning.

Details

1009240
Title
Brain‐FM: A Multimodal Foundation Model for Visual Question Answering in Brain Health Diagnostics
Author
Ebrahimkhani, Somayeh 1 ; Liu, Guimeng 1 ; Yu, Tianze 1 ; Lin, Xuling 2 ; Ng, Adeline Su Lyn 2 ; Ting, Simon Kang Seng 2 ; Hameed, Shahul 2 ; Tan, Eng King 2 ; Au, Wing Lok 2 ; Ng, Kok Pin 2 ; Cheung, Ngai‐Man 1 

 Singapore University of Technology and Design (SUTD), Singapore, Singapore, 
 National Neuroscience Institute, Singapore, Singapore, 
Publication title
Volume
21
Supplement
S2
Number of pages
3
Publication year
2025
Publication date
Dec 1, 2025
Section
BIOMARKERS
Publisher
John Wiley & Sons, Inc.
Place of publication
Chicago
Country of publication
United States
ISSN
1552-5260
e-ISSN
1552-5279
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-12-26
Milestone dates
2025-12-26 (publishedOnlineFinalForm)
Publication history
 
 
   First posting date
26 Dec 2025
ProQuest document ID
3286965925
Document URL
https://www.proquest.com/scholarly-journals/brain-fm-multimodal-foundation-model-visual/docview/3286965925/se-2?accountid=208611
Copyright
© 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2026-01-02
Database
ProQuest One Academic