Content area

Abstract

Background:While the COVID-19 pandemic has induced massive discussion of available medications on social media, traditional studies focused only on limited aspects, such as public opinions, and endured reporting biases, inefficiency, and long collection times.

Objective:Harnessing drug-related data posted on social media in real-time can offer insights into how the pandemic impacts drug use and monitor misinformation. This study aimed to develop a natural language processing (NLP) pipeline tailored for the analysis of social media discourse on COVID-19–related drugs.

Methods:This study constructed a full pipeline for COVID-19–related drug tweet analysis, using pretrained language model–based NLP techniques as the backbone. This pipeline is architecturally composed of 4 core modules: named entity recognition and normalization to identify medical entities from relevant tweets and standardize them to uniform medication names for time trend analysis, target sentiment analysis to reveal sentiment polarities associated with the entities, topic modeling to understand underlying themes discussed by the population, and drug network analysis to dig potential adverse drug reactions (ADR) and drug-drug interactions (DDI). The pipeline was deployed to analyze tweets related to the COVID-19 pandemic and drug therapies between February 1, 2020, and April 30, 2022.

Results:From a dataset comprising 169,659,956 COVID-19–related tweets from 103,682,686 users, our named entity recognition model identified 2,124,757 relevant tweets sourced from 1,800,372 unique users, and the top 5 most-discussed drugs: ivermectin, hydroxychloroquine, remdesivir, zinc, and vitamin D. Time trend analysis revealed that the public focused mostly on repurposed drugs (ie, hydroxychloroquine and ivermectin), and least on remdesivir, the only officially approved drug among the 5. Sentiment analysis of the top 5 most-discussed drugs revealed that public perception was predominantly shaped by celebrity endorsements, media hot spots, and governmental directives rather than empirical evidence of drug efficacy. Topic analysis obtained 15 general topics of overall drug-related tweets, with “clinical treatment effects of drugs” and “physical symptoms” emerging as the most frequently discussed topics. Co-occurrence matrices and complex network analysis further identified emerging patterns of DDI and ADR that could be critical for public health surveillance like better safeguarding public safety in medicines use.

Conclusions:This study shows that an NLP-based pipeline can be a robust tool for large-scale public health monitoring and can offer valuable supplementary data for traditional epidemiological studies concerning DDI and ADR. The framework presented here aspires to serve as a cornerstone for future social media–based public health analytics.

Details

1009240
Business indexing term
Title
Characterizing Public Sentiments and Drug Interactions in the COVID-19 Pandemic Using Social Media: Natural Language Processing and Network Analysis
Author
Li, Wanxin  VIAFID ORCID Logo  ; Hua, Yining  VIAFID ORCID Logo  ; Zhou, Peilin  VIAFID ORCID Logo  ; Zhou, Li  VIAFID ORCID Logo  ; Xu, Xin  VIAFID ORCID Logo  ; Yang, Jie  VIAFID ORCID Logo 
Publication title
Volume
27
First page
e63755
Publication year
2025
Publication date
2025
Section
Public (e)Health, Digital Epidemiology and Public Health Informatics
Publisher
Gunther Eysenbach MD MPH, Associate Professor
Place of publication
Toronto
Country of publication
Canada
e-ISSN
1438-8871
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-03-05
Milestone dates
2024-06-28 (Preprint first published); 2024-06-28 (Submitted); 2024-12-19 (Revised version received); 2025-01-25 (Accepted); 2025-03-05 (Published)
Publication history
 
 
   First posting date
05 Mar 2025
ProQuest document ID
3222368101
Document URL
https://www.proquest.com/scholarly-journals/characterizing-public-sentiments-drug/docview/3222368101/se-2?accountid=208611
Copyright
© 2025. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-11-07
Database
2 databases
  • Coronavirus Research Database
  • ProQuest One Academic