Content area

Abstract

In this study, transcribed videos about personal experiences with COVID-19 were used for variant classification. The o1 LLM was used to summarize the transcripts, excluding references to dates, vaccinations, testing methods, and other variables that were correlated with specific variants but unrelated to changes in the disease. This step was necessary to effectively simulate model deployment in the early days of a pandemic when subtle changes in symptomatology may be the only viable biomarkers of disease mutations. The embedded summaries were used for training a neural network to predict the variant status of the speaker as “Omicron” or “Pre-Omicron”, resulting in an AUROC score of 0.823. This was compared to a neural network model trained on binary symptom data, which obtained a lower AUROC score of 0.769. Results of the study illustrated the future value of LLMs and audio data in the design of pandemic management tools for health systems.

Details

Title
Generative AI and unstructured audio data for precision public health
Author
Anibal, James 1 ; Landa, Adam 2 ; Nguyen, Hang 3 ; Daoud, Veronica 4 ; Le, Tram 5 ; Huth, Hannah 2 ; Song, Miranda 2 ; Peltekian, Alec 6 ; Shin, Ashley 7 ; Hazen, Lindsey 2 ; Christou, Anna 2 ; Rivera, Jocelyne 2 ; Morhard, Robert 2 ; Brenner, Jacqueline 2 ; Bagci, Ulas 8 ; Li, Ming 2 ; Bensoussan, Yael 4 ; Clifton, David 9 ; Wood, Bradford 2 

 NIH Clinical Center, Center for Interventional Oncology, Radiology and Imaging Sciences, Bethesda, USA (GRID:grid.410305.3) (ISNI:0000 0001 2194 5650); University of Oxford, Computational Health Informatics Lab, Oxford Institute of Biomedical Engineering, Oxford, UK (GRID:grid.4991.5) (ISNI:0000 0004 1936 8948) 
 NIH Clinical Center, Center for Interventional Oncology, Radiology and Imaging Sciences, Bethesda, USA (GRID:grid.410305.3) (ISNI:0000 0001 2194 5650) 
 Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam (GRID:grid.412433.3) (ISNI:0000 0004 0429 6814) 
 University of South Florida, Morsani College of Medicine, Tampa, USA (GRID:grid.170693.a) (ISNI:0000 0001 2353 285X) 
 University of South Florida, College of Engineering, Tampa, USA (GRID:grid.170693.a) (ISNI:0000 0001 2353 285X) 
 Northwestern University, Department of Computer Science, McCormick School of Engineering, Evanston, USA (GRID:grid.16753.36) (ISNI:0000 0001 2299 3507) 
 National Institutes of Health, National Library of Medicine, Bethesda, USA (GRID:grid.94365.3d) (ISNI:0000 0001 2297 5165) 
 Northwestern University, Feinberg School of Medicine, Chicago, USA (GRID:grid.16753.36) (ISNI:0000 0001 2299 3507) 
 University of Oxford, Computational Health Informatics Lab, Oxford Institute of Biomedical Engineering, Oxford, UK (GRID:grid.4991.5) (ISNI:0000 0004 1936 8948) 
Pages
19
Publication year
2025
Publication date
Dec 2025
Publisher
Nature Publishing Group
e-ISSN
30051959
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3225849528
Copyright
Copyright Nature Publishing Group Dec 2025