Content area
Background:The opioid epidemic in the United States remains a major public health concern, with opioid-related deaths increasing more than 8-fold since 1999. Chronic pain, affecting 1 in 5 US adults, is a key contributor to opioid use and misuse. While previous research has explored clinical and behavioral predictors of opioid risk, less attention has been given to large-scale linguistic patterns in public discussions of pain. Social media platforms such as X (formerly Twitter) offer real-time, population-level insights into how individuals express pain, distress, and coping strategies. Understanding these linguistic markers matters because they can reveal underlying psychological states, perceptions of health care access, and community-level opioid risk factors, offering new opportunities for early detection and targeted public health response.
Objective:This study aimed to examine linguistic markers of pain communication on the social media platform X and assess whether language patterns differ among US states with high and low opioid mortality rates. We also evaluated the predictive power of these linguistic features using machine learning and identified key thematic structures through semantic network analysis.
Methods:We collected 1,438,644 pain-related tweets posted between January and December 2021 using tweepy and snscrape. Tweets from 2 high-opioid mortality states (Ohio and Florida) and 2 low opioid mortality states (South and North Dakota) were selected, resulting in 31,994 tweets from high-death states (HDS) and 750 tweets from low-death states (LDS). Six machine learning algorithms (random forest, k-nearest neighbor, decision tree, naive Bayes, logistic regression, and support vector machine) were applied to predict state-level opioid mortality risk based on linguistic features derived from Linguistic Inquiry and Word Count. Synthetic Minority Oversampling Technique was used to address class imbalance. Semantic network analysis was conducted to visualize co-occurrence patterns and conceptual clustering.
Results:The random forest model demonstrated the strongest predictive performance, with an accuracy of 94.69%, balanced accuracy of 94.69%, κ of 0.89, and an area under the curve of 0.95 (P<.001). Tweets from HDS contained significantly more affective pain words (t31,992=10.84; P<.001; Cohen d=0.12), health care access references, and expressions of distress. LDS tweets showed greater use of authenticity markers (t31,992=−10.04; P<.001) and proactive health-seeking language. Semantic network analysis revealed denser discourse in HDS (density=0.28) focused on distress and barriers to care, while LDS discourse emphasized recovery and optimism.
Conclusions:Our findings demonstrated that linguistic markers in publicly shared pain-related discourse show distinct and predictable differences across regions with varying opioid mortality risks. These linguistic patterns reflect underlying psychological, social, and structural factors that contribute to opioid vulnerability. Importantly, they offer a scalable, real-time resource for identifying at-risk communities. Harnessing social media language analytics can strengthen early detection systems, guide geographically targeted public health messaging, and inform policy efforts aimed at reducing opioid-related harm and improving pain management equity.
Details
Software;
Application programming interface;
Public health;
Risk factors;
Communication;
Mortality;
Coping strategies;
Opioids;
Social networks;
Clinical research;
Mass media;
Health status;
Health care access;
Narcotics;
Machine learning;
Fatalities;
Help seeking behavior;
Death & dying;
Optimism;
Social media;
Imbalance;
Chronic pain;
Classification;
Epidemics;
Natural language processing;
Linguistics;
Mortality rates;
Clustering;
Health services utilization;
Network analysis;
Comorbidity;
Semantics;
Psychological distress;
Mental health;
Accuracy;
Language patterns;
Coping;
Risk;
Semantic analysis;
States;
Discourse analysis;
Computer mediated communication;
Deaths;
Health services;
Mental health services;
Sociolinguistics;
Decision making;
Adults;
North and South;
Language;
Social factors;
Pain;
Health care