Introduction
Approximately 16 million Americans are informal unpaid caregivers for family members or close friends with Alzheimer’s disease and related dementias (ADRD)1. Caring for people with ADRD is usually long-term and demanding. Nearly 60% of caregivers of people with ADRD have had caregiver responsibilities for over four years1, and 60% anticipate continued caregiving in the next five years. About one out of every three of these informal caregivers are older adults (age 65 or older)1. Studies report that the level of caregiving burden for people with ADRD is higher than with other diseases2,3, and burdensome caregiving duties hinder the caregiver from caring for themselves4. Thus, caregivers are often at high risk of mental distress; including depression, anxiety, and poor quality of life5,6. Although the burden of caregiving and related mental distress have been studied in this population7–9, investigations about mental health using unfiltered social media data of caregivers of ADRD patients are rare. Listening to the caregivers’ voices is crucial to accordingly support them because it could represent their values and needs better than what is prioritized by the clinicians10.
Qualitative research has been employed to listen to the caregivers’ needs, yet it has some limitations. First, caregiver interviews are challenging to conduct on a large scale. In addition, recruiting caregivers to take part in research studies can be difficult even with financial incentives, as they often have busy caregiving schedules11. Furthermore, caregivers may be reluctant to share struggles with investigators12. Therefore, obtaining genuine insights into the real-life concerns and mental distress of caregivers of ADRD patients through in-depth interviews can be challenging.
Online caregiving forums could be a promising data source for investigating caregivers’ genuine struggles due to the anonymity the forums provide, particularly for private concerns for which caregivers are unsure where or with whom to discuss12. Natural language processing/machine learning (NLP/ML)-based topic modeling approaches have been increasingly used to extract large volumes of user-generated data, including posts from online forums, to examine various health issues (e.g., suicidal ideation and cochlear implants)10,13–16. The direct voices of informal caregivers obtained from unfiltered discussions would contain rich and valuable information that is not easily available through conventional means, such as clinical settings or surveys. A detailed understanding of the burdens and mental stressors of caregivers may help contribute to the customization of need-based mental health support for caregivers, improve caregiver mental health outcomes and quality of life17, and ultimately improve the quality of care for ADRD patients18,19.
To our knowledge, few prior studies have used systematic text-mining methods to analyze the discussions and sentiment of a large, text-based forum of caregivers. We hypothesize that (1) the online caregiving forum provides high-quality data that contains insights into mental distress and related stressors of informal caregivers of people with ADRD, and (2) the NLP/ML method could serve as a promising technique for large-scale thematic analyses for online forum data. Thus, this study aimed to identify novel trends in the caregiver mental stressors and care needs qualitatively and validate the results of our NLP/ML analysis via both a comparison to our qualitative analysis of the online caregiving forum, and a comparison to the existing literature. The findings have the potential to improve our understanding of informal caregivers’ genuine concern and mental distress in more detail, enabling us to prepare tailored support for this vulnerable population.
Methods
Data source: online caregiving forum
ALZConnected.org is a USA-based website for ADRD patients and their caregivers officially supported by Alzheimer’s Association20. The caregiver forum of ALZConnected.org is the place where the caregivers, mostly informal caregivers of people with ADRD are actively seeking and providing advice and information to care for their loved ones4. The forum requires official registration as a member in order to write an original post and replies to the posts are granted only to the members. However, all the forum posts are public and seen by anyone without registration. This public forum was previously used for content analysis for other health topics among caregivers and ADRD patients21. The Institutional Review Board of Stanford University deemed this study exempt from human subjects research.
Data extraction: web-scraping
Web-scraping was done to retrieve public forum posts, from March 1, 2018 to February 28, 2022. We scraped only publicly available information, including the title, main body, and replies to the body of the posts, the date the post was added, the username of the post, and the date the user joined the forum. We pre-specified ten mental health keywords to selectively extract posts that contained caregivers’ mental distress and related stressors using the following keywords: depression, anxiety health, little interest, hopelessness, nervousness, worrying, loneliness, mental health, and mental distress. Ten keywords were adapted from the terms of widely used mental distress screening tool, patient health questionnaire-4, as well as commonly used terms for mental distress status in online health forums16. We programmed to scrape the posts if they contained any of the listed keywords, and the extracted information was imported into the Microsoft Excel spreadsheet.
Feasibility of studying mental health care needs of caregivers using online forum data
The feasibility was determined based on whether we were able to investigate informal caregivers’ mental distress and related stressors and needed support through the online caregiving forum data qualitatively and quantitatively. The titles of the posts were screened for this purpose given that they typically encapsulate the essence of the posts (e.g., “Emotionally tired,” “Feeling frustrated,” “Caregiver recovery program?”). Three independent researchers reviewed the titles for feasibility assessment (JK, YC, and ZRC). The primary researcher (JK) reviewed and categorized the posts if the title intended to (1) express the caregivers’ own mental distress or negative emotions, or (2) seek advice or support for their own mental and emotional distress. If the post could be qualified for both, it was assigned to the seeking advice category. If the title did not provide enough information to determine the category, the body of the posts was read as needed. The secondary researcher (YC) reviewed and confirmed the categorization adapting member checking approach. To further ensure the quality of categorization, the third researcher (ZRC) iteratively validated it by randomly selecting 100 titles from the total posts to review and categorize independently. The categorization of the third researcher (ZRC) was compared with the original categorization (JK and YC) and intercoder reliability was reported in terms of Cohen’s kappa.
Data analysis using natural language process (NLP)
Natural language processing (NLP) is a subarea of linguistics, computer science, and artificial intelligence and is widely used for large data analysis in health-related fields22,23. Machine learning (ML)-based NLP algorithm was used to perform three NLP techniques; tokenization, lemmatization and stemming, and topic modeling. For text analysis, we used some functions of NLTK (Natural Language Toolkit) for text pre-processing. First, NLTK Word-Tokenize split texts into groups of words, which were tokens (e.g., Thanks for any insight here. → ‘Thanks’, ‘for’, ‘any’, ‘insight’, ‘here’, ‘. ’). Second, we removed stopwords (e.g., ‘the’, ‘of’, ‘to’) listed in the NLTK corpus, along with additional stopwords defined at a later stage. This process also involved the removal of articles and punctuation. Third, NLTK Lemmatizer (e.g., WordNetLemmatizer) was used to strip the words down to their most basic form (e.g., being → be, walks → walk, Thank → thank). Fourth, with these pre-processed words, topic modeling was conducted using the Latent Dirichlet Allocation (LDA). LDA is a three-level hierarchical Bayesian model for collections of discrete data, including text corpora24. When LDA is applied to text modeling, it can generate a set of topic probabilities, in which each topic probability could represent a single prominent topic. The LDA model creates a group of topics that contain words that are most likely to belong to using ‘genism’ and ‘pyLDAvis’ libraries. For the topic modeling, we built bigram and trigram models to catch two or three-word groups commonly appearing together for advanced interpretation of the topic modeling results.
Validity assessment of the topics generated by the NLP/ML-based topic modeling
The validity assessment was done with two primary objectives. The first was to validate the performance of the NLP/ML model to see if it can generate the representative topics discussed in the online forum. For this purpose, the topics generated by the NLP/ML topic modeling were compared with the themes qualitatively generated by human readers to see if the NLP/ML modeled topics were matched with the qualitative themes. As a first step for this validation assessment, two researchers labeled the NLP-modeled topics. The primary researcher (SJR) interpreted and labeled the topics using the word clouds from the NLP model. The secondary researcher (SO) reviewed and confirmed the labels based on the member-checking approach. The disagreements were resolved through discussion. As a second step, Thematic Analysis was applied, which has been widely used to identify and analyze patterns of themes, topics, or ideas from online data for individuals with ADRD and their caregivers21. Two trained researchers qualitatively analyzed the text data, none of them were involved in the labeling of the NLP topics to avoid potential bias. The primary researcher (JK) read the posts carefully to generate tentative themes of the posts and refined the themes after multiple reviews. Out of posts containing at least one mental health keyword, we initially selected 100 posts at random to create the initial theme codebook, then progressively added additional posts into the analysis until no new topics could be discerned. The secondary researcher (YC) independently read the posts at random to review, check, and further refine the initial codes. Two coders met to reconcile the conflicts through discussion to finalize the codebook. Then, to validate the code, two researchers independently read the posts (body and title) to apply the identified code using approximately 3% of the posts. Two coders discussed conflicts for reconciliation and theme confirmation, and a meeting was planned with the third coder in case the two primary coders were unable to resolve the disagreement.
Upon the completion of the thematic analysis, a trained researcher (ZRC), who was neither involved in the thematic analysis nor NLP topic labeling, assessed the validity of the NLP topic modeling approach of extracting the key information from the large text data, as done previously25. The NLP-modeled topics were compared with themes from the qualitative analysis to be matched and the comparison was reported as a table10.
The second objective was to validate the online caregiving forum content if it aligns with the existing knowledge on caregivers’ mental distress and related stressors. For this purpose, we compared the NLP/ML modeled topics and themes with the well-developed framework for caregiving burden and strain among informal caregivers of people with ADRD4. Two researchers (ZRC and JK), who know the context of the online caregiving forum well, independently examined whether the online forum content (NLP/ML topics and themes) was consistent with the previously reported informal caregivers’ mental distress and related stressors. The assessments were compared and reconciled through discussion, and the agreed validity assessment was reported as a comparison table.
Results
Description of the data source
We extracted posts from a period spanning from March 1, 2018, to February 28, 2022 (Fig. 1). The total number of posts collected was 60,812, composed of 8244 original posts and 52,568 reply posts from 5415 unique users. Among these, 5848 posts contained one or more of the ten designated mental health keywords and were used for topic modeling and qualitative analysis. On average, each unique user made 1.52 original posts. There was an average of 6.4 replies per original post.
Fig. 1 [Images not available. See PDF.]
Study design and data flow chart.
Feasibility assessment was done to examine if we can study caregivers’ mental distress and related stressors using online forum data; a validity assessment was done to evaluate (1) the NLP/ML method validity (NLP/ML generated topics vs qualitatively defined themes) and (2) content validity of the online forum (NLP/ML generated topics and qualitatively defined themes vs existing caregiver framework).
Feasibility of studying mental health care needs of caregivers using online forum data
We screened all the titles of the posts that contained at least one mental health keyword either in their title or body of the posts. Of a total of 5848 posts (original or reply), 963 posts were identified by three researchers as eligible to study mental distress or the situations that could elicit mental distress in informal caregivers of people with ADRD. Approximately, 93% (894 posts out of 963) were considered as posts intending to express negative emotions, and the rest (7%, 69 posts out of 963) were categorized as posts to seek specific advice or resources to cope with caregiver distress. Cohen’s kappa was 0.90 (almost perfect agreement). Table 1 presents the example titles that repeatedly appeared among those eligible titles.
Table 1. Example titles that showed the mental health care needs of informal caregivers of a person with ADRD from the online discussion foruma
1. Expressing their emotions or mental distress to their online peers |
• “In desperate need to vent.” • “I’m feeling angry guilty overwhelmed and like I’m failing.” • “I find myself wanting to almost let go or at least compartmentalize what is happening.” • “Hate being mad all the time.” • “Frustrated anxious depressed.” • “Exhausted and angry—venting.” • “Caregiver is exhausted/in danger.” |
2. Seeking some advice on how to manage their emotions or self-care |
• “How do you manage your emotions knowing dementia is a long goodbye? + What’s the point of an early diagnosis when there’s no treatment? + more questions” • “How can I deal with my anger?” • “In a deep depression. Don’t know how to get out. Anyone experienced this?” • “Finding a way to take care of myself-long.” • “Caregiver selfcare.” |
aTitles were selected for this table when those or nearly the same ones appeared at least three times.
Results of the NLP/ML-based topic modeling
The NLP/ML-based topic modeling created the most salient eight topics out of the original posts, providing ten keywords for each topic (Table 2). The eight topics represent the most significant and frequent discussions from the online forum, including caregiving duty and burden (e.g., Topic 2: talk, call, work, sit, sleep, and eat), coping strategies, and caregiver support (e.g., Topic 5: heart, love, peace, light, good, and thought), and institutionalization (e.g., Topic 7: move, place, facility, home, house, and pay). These ten keywords were visualized as word clouds in Supplementary 1.
Table 2. Representative topics of informal caregivers’ concern and distress from online discussion forum generated by NLP/ML topic modeling
The likely scenarios of caregivers’ concerns and distress-related issues | |
---|---|
Topic 1 | • Found time to do housework and make calls. • Eating issues started. |
Topic 2 | • Talk, work, call, sit, and have the loved one eat. • Need to go back to bed and sleep. |
Topic 3 | • Recall how my loved one used to live back when they didn’t have dementia. • With dementia, the loved one does not remember how to drive a car, give a call, come back home, and live alone. |
Topic 4 | • Celebrate victories for loved ones, such as showering and going to the dentist, hoping the loved one forgives me for doing something not pleasant. • The righteousness enables me to do good and give care to loved ones. |
Topic 5 | • Try to keep good thoughts and words like love and peace in my heart. |
Topic 6 | • Can someone give good advice on caring for a loved one with dementia from home? • Dementia and caregiving life started. People visit home and make phone calls. |
Topic 7 | • Worried about moving a loved one to a care facility. • Do not know how I will pay for it. |
Topic 8 | • Need to find good care for loved ones with ADRD. • Want to make sure that loved one is monitored/watched well. |
The main themes of the online caregiver forum created by the qualitative analysis
Approximately, 3% of the total posts were analyzed, leading to the identification of two primary themes with eight subthemes (for care recipients: symptoms, medications, relocation, care duty share, new diagnosis, conversation strategy with a person with dementia, PWD; for caregivers: caregiver burden, caregiver support) (Table 3).
Table 3. Representative themes of informal caregivers’ primary issues from online discussion forum generated by qualitative analysis
For care recipient |
---|
Theme 1. Symptom |
• Shared the loved one (LO)’s symptoms (e.g., depression, anxiety, hallucination, delirium, loneliness, and obsessive behaviors). • Sought advice and information about their commonality, causality, and mitigation strategies. |
Theme 2. Medication |
• Discussed the suggested medications in easing symptoms such as pain, depression/anxiety, and hallucinations. • Advice for appropriate usage, timing, the pros and cons of medication, and strategies to ensure LO’s adherence to the medication, and have them avoid dangerous mixtures of medications that could trigger side effects. |
Theme 3. Relocation |
• The LO’s transition to memory care, assisted living (institutionalization), or relocating to the caregiver’s neighborhood. • For institutionalization: optimal timing, how to find a good place, pros (e.g., help alleviate the LO’s loneliness) and cons, preparations for moving and transitioning. • Care strategies after institutionalization. |
Theme 4. Care duty share |
• Often shared the conflicts regarding the division of caregiving duties among siblings and other family members. • Often expressed frustrations because their siblings or other family members showed selfish attitudes, not caring for the LO and the primary caregiver, and used the forum as an outlet for expressing confusion and anger. • Sought advice on increasing other family members’ involvement. |
Theme 5. Diagnosis |
• Often introduced that they are new to the community and sought advice regarding their fresh experiences with newly diagnosed situations. • Other community members offered detailed information in response to specific questions and recommended that new members explore previous posts containing similar questions and thoughtful replies. |
Theme 6. Conversation strategy with PWD |
• Discussed how to communicate effectively with people with dementia (PWD). • Sought guidance on persuading their LO to seek professional medical assessment for potential ADRD symptoms or related prescription medications. Often, the LOs deny exhibiting any ADRD-like symptoms, despite these being apparent to their adult children, leading to furious refusal of medical appointments. • Expressed challenges in devising appropriate responses or conversational strategies in instances when the PWD displayed confusion about their personal identities, deceased spouses or parents, or directed abusive words at caregivers due to misunderstanding. |
For caregiver |
Theme 7. Caregiver burden |
• Often emotional in their posts, expressing mixed feelings with love, frustration, exhaustion, and guilt (e.g., “miss my mom before the disease,” “feel guilty not wanting to care for my mom,” “feeling people don’t care about caregivers’ lives,” “I hate my life and I want to either her to die or myself”). • Shared their struggles with mental health (e.g., depression, anxiety, and sleep deprivation). • Overwhelmed due to multiple responsibilities: caregiving and household duties for the LO, taking care of own family, particularly challenging as the LO often desired their continuous presence and companionship. • The lack of time for self-care further exacerbated their emotional distress. |
Theme 8. Caregiver support |
• Encouragement was the most frequently witnessed attitude in the forum in response to the concerns and mental health-related posts (e.g., “You are not alone,” “Prayers for you and your mom,” “Hanging in there we are all thinking good things for you”). |
Each bullet indicates a sub-category of the theme
Validity of the NLP/ML topic modeling
All eight major topics generated from the NLP/ML topic modeling successfully aligned with the qualitatively defined themes (Table 4). However, there was one difference between the NLP/ML topics and themes. Medication (Theme 2) was not matched with any of the eight NLP/ML topics.
Table 4. The comparisons of informal caregivers’ concern and distress between the NLP/ML generated topics and qualitatively defined themes
Topics generated by the NLP/ML topic modeling (computer reads) | Themes generated by qualitative analysis (human reads) | |
---|---|---|
Topic 1 | Found time to do housework and make calls. Eating issues started. | Theme 1. Symptoms Theme 7. Caregiver burden |
Topic 2 | Talk, work, call, sit, and have the loved one eat. Need to go back to bed and sleep. | Theme 7. Caregiver burden |
Topic 3 | Recall how my loved one used to live back when they didn’t have dementia. With dementia, the loved one does not remember how to drive a car, give a call, come back home, and live alone. | Theme 1. Symptoms Theme 7. Caregiver burden |
Topic 4 | Celebrate victories for loved ones, such as showering and going to the dentist, hoping the loved one forgives me for doing something not pleasant. The righteousness enables me to do good and give care to loved ones. | Theme 6. Conversation strategy with PWD Theme 7. Caregiver burden |
Topic 5 | Try to keep good thoughts and words like love and peace in my heart. | Theme 7. Caregiver burden Theme 8. Caregiver support |
Topic 6 | Can someone give good advice on caring for a loved one with dementia from home? Dementia and caregiving life started. People visit home and make phone calls. | Theme 5. Diagnosis Theme 7. Caregiver burden |
Topic 7 | Worried about moving a loved one to a care facility? Do not know how I will pay for it. | Theme 3. Relocation Theme 7. Caregiver burden |
Topic 8 | Need to find good care for loved ones with ADRD. Want to make sure that loved one is monitored/watched well. | Theme 4. Care duty share Theme 7. Caregiver burden |
Validity of online discussion forum data
The online discussion forum content, which was represented by the eight NLP/ML-generated topics validated through manually defined themes, was successfully aligned with the existing framework of caregiver concerns and stress4. Specifically, these topics were matched with six primary categories of the framework, including physical and psychological morbidity, social isolation, lack of support, nursing home admission, predictors, and protectors of caregiver distress (disease severity and perception and experience of caregiving role) (Table 5).
Table 5. Comparison of caregiving stress and mental distress of family caregivers of a person with ADRD from the online discussion forum with existing framework4
Common concern and distress-related issues of informal caregivers of person with ADRD | |
---|---|
From existing framework | From online caregiving discussion forum |
Physical and psychological morbidity • Poor sleep quality • Chronic disease | Theme 7. Caregiver burden • Sleep deprivation • Suffering from anxiety/depression |
Social isolation • Sacrifice their leisure pursuits • Restrict time with own family and friends • Give up/reduce employment | Theme 7. Caregiver burden • Give up employment • Lack of time to deal with multiple duties • Lack of self-care |
Lack of support • Instrumental (e.g., help with daily living) • Emotional • Informational (e.g., knowledge from the health professionals or someone who has experienced similar situations) | Theme 4. Care duty share • Conflicts between siblings due to care duty share for a parent with ADRD Theme 7. Caregiver burden • Overwhelmed by multiple responsibilities • Emotional distress (e.g., frustration) Theme 8. Caregiver support • Searching for peer support groups • Encouraging each other • Seeking information from health professionals and online forum |
Nursing home admission • Guilt, anxiety/depression • Financial burden | Theme 3. Relocation • Preparation for institutionalization Theme 7. Caregiver burden • Mixed emotions (e.g., frustration for being blamed by the LO, happy to have a break but guilty for happiness) |
Predictors and protectors of caregiver distress: Disease severity • More neuropsychiatric symptoms • Behavioral problems | Theme 1. Symptom Theme 2. Medication Theme 6. Conversation strategy with PWD • How to care for psychiatric symptoms of parents with ADRD, behavioral and medical advice, especially on the conversational strategy |
Predictors and protectors of caregiver distress: Perception and experience of caregiving role • Low sense of confidence | Theme 5. Diagnosis • Asking for advice on where to start when the parent was diagnosed |
Discussion
We assessed the feasibility of studying the mental health care needs of informal caregivers of people with ADRD using online caregiving forum data mining. Our findings demonstrate that this methodology applied to public online forums is valuable in identifying caregivers’ mental distress and related stressors. Furthermore, the NLP/ML-generated topics that provided valuable representative categories were mostly consistent with our qualitatively defined themes and existing framework for caregivers’ mental distress and stressors. However, we also witnessed the limitation of the NLP/ML topic modeling in that it was unable to detect a topic discussed with heterogenous words while the topic was defined as a main theme by human readers. All in all, our findings highlight that the use of public online forum data from caregivers and patients could be a promising approach for gaining insights to support family caregivers.
We examined whether the content of the online discussion forum was qualitatively and quantitatively sufficient to advance our understanding of the mental distress and related stressors of informal caregivers of people with ADRD. This rich dataset provided a comprehensive knowledge of the challenges and emotional experiences faced by caregivers. Among those, the majority of the discussions focused on simply expressing their emotional distress or sharing challenges they experienced day to day as a catharsis. Yet, some caregivers specifically asked about coping with mental distress and recommended self-care for themselves. Moreover, we found that there were repeatedly expressed emotions besides depression and anxiety, which were “venting,” “mad,” “frustrated,” and “exhausted.” These findings suggest that further examination of online caregiver forum contents could illuminate the intensity of mental and emotional distress and allow us to identify common situations related to these negative emotions. Understanding the perspectives of informal caregivers of people with ADRD can deepen our insight into their caregiving burden, enabling the design of interventions that meet their needs16. Given that caregiving for ADRD patients is typically a long-term and burdensome commitment, amplifying their voices to provide tailored support could improve their mental health outcomes and quality of life. This could, in turn, potentially enhance the quality of care for ADRD patients18.
Additionally, we would like to note that we identified a few posts that showed caregivers’ severe mental distress with extreme expressions (e.g., “suicide”). Although fellow community peers sent supportive messages in these specific cases, some of those who wrote the posts seemed to require immediate attention and external support. This suggests that it is very important to discuss the potential role of online caregiving communities in identifying caregivers at high risk of severe mental distress and facilitating timely support as those become an essential source for caregiving duty16,26.
In this study, we observed that the NLP/ML-modeled topics of the online caregiving forum were valid, meaning these were representative of the discussion forum and resonated with existing knowledge on caregivers’ mental distress and related stressors7. Particularly, Brodaty et al.‘s systematic categorization of family caregiver distress among those people with ADRD included physical and psychological morbidity (e.g., poor sleep quality, chronic disease condition), social isolation, lack of support (instrumental, emotional, and informational), institutionalization (e.g., financial burden, guilt, and depression), disease severity (e.g., behavioral problems), experience of caregiving role4. All the eight major topics generated by the NLP/ML algorithms corresponded well with these known concerns and stressors of informal caregivers. The findings highlight that the NLP/ML-enabled online forum data analysis could benefit caregiver research in assessing their further issues and needed support, given its less resourceful nature than the conventional qualitative approach.
Notably, there were prominent discussions around the relocation of a loved one with ADRD in the online forum. The content revealed that family caregivers of those with ADRD often face significant challenges before and after institutionalization, both physically (e.g., sleep deprivation) and psychosocially (e.g., feelings of frustration, guilt, and depression). Caregivers sought advice on when, where, and how to facilitate the transition, and grappled with emotional distress post-institutionalization, often exacerbated by verbal abuse from loved ones blaming them for the move. These insights provide a detailed understanding of the challenges faced by family caregivers, aspects that are not frequently explored in existing literature. Our findings highlight the urgent need for considerable informational and emotional support for informal caregivers27, especially those in the process of institutionalization.
One difference between the NLP/ML topics and themes that we observed needs to be noted. The topic modeling did not cover the topics about medication use while this was defined as a major theme by qualitative analysis. Perhaps the situation that people mentioned specific product names instead of the general term ‘medication’ might have been one reason that the NLP/ML model was not able to catch this discussion. This indicates that the NLP/ML topic modeling method should be interpreted with caution when used for content with heterogeneous words. Further research would be needed on the systematic validation of the topic modeling method as it is becoming a promising tool in health research.
The limitations of the study need to be acknowledged. We used data from a public online forum which makes it hard to guarantee the authenticity of the posts28. It is possible that some users may not be genuine caregivers of people with ADRD. Additionally, while our sample size was sufficient, our findings might not be generalizable to all caregivers not using this specific online forum. Despite the rigorous validation procedure we followed, the machine learning algorithms might have limitations in fully capturing the subtleties and complexities of human emotions and feelings, which can lead to some degree of misclassification or oversimplification of the topics29. Further research is required to enhance the accuracy and sensitivity of these tools to better interpret the vast and complex emotional landscapes within the caregiving community. Furthermore, as the data used in our analysis was publicly available and anonymous, we had limited demographic information about the caregivers participating in the forum. Thus, we were unable to explore the possible associations between caregivers’ mental health issues and demographic factors such as age, gender, and socioeconomic status, or their relationship to the person with ADRD. In addition, to extract the caregivers’ mental distress-related posts, we used ten predefined mental health keywords. Hence, it is possible that not all the relevant posts were included in this study. Lastly, given that the word cloud could be interpreted in different ways, precise labeling was challenging. Thus, it is possible that the labels might not be able to cover all the representative discussions.
In sum, the online caregiver forum data and the NLP/ML topic modeling enabled us to study mental distress and needed support from informal caregivers of people with ADRD. The findings from rigorous validation shed light on the potential of NLP/ML-based text analysis of the online discussion forum for informal caregiver research that can further assess needed support for this vulnerable population. This is meaningful because the online platforms provide a unique chance to access the voices and perspectives of caregivers who may not readily disclose their mental health concerns in conventional research settings12. The approach could also be applied to other online communities for patients and caregivers30, opening new opportunities for using patient and caregiver-generated data to provide need-based tailored support.
Acknowledgements
Funding information and the role of the funders: Funding for this article was provided by the National Institutes of Health: JK is supported by the NIH (K01MH137386) and EL is supported by the Mid-career Investigator Award in Patient-Oriented Research (K24AR075060), Research Project Grant (R01AR082109). These funding sources did not have any roles in study design, in the collection, analysis, and interpretation of data, in the writing of the report, and in the decision to submit the article for publication.
Author contributions
All authors contributed to study design and conduct (acquiring and analyzing data, drafting and revising the manuscript): JK and YKC for conception and study design. JK and YKC for data acquisition, data curation, and formal analysis. JK, ZRC, SJR, SO, VF, and YKC for methodology and investigation. JK, ZRC, MLC, and YKC for writing the original manuscript, and JK, ZRC, MLC, SJR, SO, CIR, THB, VF, RAW, EL, and YKC for several rounds of review and revision. EL and YKC for supervision. All authors critically reviewed and approved the final manuscript before submission. JK and YKC had full access to all the data and guaranteed the integrity of the work. *EL and YKC are joint senior authors.
Data availability
The data used for this study is publicly available at https://alzconnected.org/. The web-scraped data will be available from the corresponding author upon reasonable request.
Code availability
The code used for web-scraping and data analysis is available from the corresponding author upon reasonable request.
Competing interests
The authors declare no competing interest.
Supplementary information
The online version contains supplementary material available at https://doi.org/10.1038/s44184-024-00100-y.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. CDC Caregiving. Caregivers for Person with Alzheimer’s Disease or a Related Dementia. Alzheimer’s Disease and Healthy Aging (CDC, 2024); https://www.cdc.gov/aging/caregiving/alzheimer.htm.
2. 2022 Alzheimer's disease facts and figures. Alzheimer's Dement. 18,700–789 (2022). https://doi.org/10.1002/alz.12638.
3. Seeher, K., Low, L.-F., Reppermund, S. & Brodaty, H. Predictors and outcomes for caregivers of people with mild cognitive impairment: a systematic literature review. AlzheimersDement. https://doi.org/10.1016/j.jalz.2012.01.012 (2013).
4. Brodaty, H. & Donkin, M. Dialogues in clinical neuroscience family caregivers of people with dementia. DialoguesClin. Neurosci. https://doi.org/10.31887/DCNS.2009.11.2/hbrodaty (2009).
5. Polenick, CA; Min, L; Kales, HC. Medical comorbidities of dementia: links to caregivers’ emotional difficulties and gains. J. Am. Geriatr. Soc.; 2020; 68, pp. 609-613. [DOI: https://dx.doi.org/10.1111/jgs.16244] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31746461]
6. Pinquart, M; Sörensen, S. Differences between caregivers and noncaregivers in psychological health and physical health: a meta-analysis. Psychol. Aging; 2003; 18, pp. 250-267. [DOI: https://dx.doi.org/10.1037/0882-7974.18.2.250] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/12825775]
7. Llanque, S., Savage, L., Samaritan, L.-G. & Rosenburg, N. Concept analysis: Alzheimer’s caregiver stress. Nurs. Forum. https://doi.org/10.1111/nuf.12090 (2016).
8. Lilly, MB; Robinson, CA; Holtzman, S; Bottorff, JL. Can we move beyond burden and burnout to support the health and wellness of family caregivers to persons with dementia? Evidence from British Columbia, Canada. Health Soc. Care Commun.; 2012; 20, pp. 103-112. [DOI: https://dx.doi.org/10.1111/j.1365-2524.2011.01025.x]
9. Pearlin, LI; Mullan, JT; Semple, SJ; Skaff, MM. Caregiving and the stress process: an overview of concepts and their measures. Gerontologist; 1990; 30, pp. 583-594.[COI: 1:STN:280:DyaK3M7hsFWjtA%3D%3D] [DOI: https://dx.doi.org/10.1093/geront/30.5.583] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/2276631]
10. Christopher, W. Exploring patient experiences and concerns in the online cochlear implant community: a cross-sectional study and validation of automated topic modelling. Clin. Otolaryngol. https://doi.org/10.1111/coa.14037 (2023).
11. Rahman, MS. The advantages and disadvantages of using qualitative and quantitative approaches and methods in language “Testing and Assessment” research: a literature review. J. Educ. Learn.; 2016; 6, p102. [DOI: https://dx.doi.org/10.5539/jel.v6n1p102]
12. Diefenbeck, CA; Klemm, PR; Hayes, ER. ‘Anonymous meltdown’: content themes emerging in a nonfacilitated, peer-only, unstructured, asynchronous online support group for family caregivers. CIN Comput. Inform. Nurs.; 2017; 35, pp. 630-638. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28742535]
13. Salmi, S., Mérelle, S., Gilissen, R., Van Der Mei, R. & Bhulai, S. Detecting changes in help seeker conversations on a suicide prevention helpline during the COVID-19 pandemic: in-depth analysis using encoder representations from transformers. BMCPublicHealth. https://doi.org/10.1186/s12889-022-12926-2 (2021).
14. Baird, A; Xia, Y; Cheng, Y. Consumer perceptions of telehealth for mental health or substance abuse: a Twitter-based topic modeling analysis. JAMIA Open; 2022; 5, ooac028. [DOI: https://dx.doi.org/10.1093/jamiaopen/ooac028] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35495736][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9047171]
15. Lin, SY et al. Social media data mining of antitobacco campaign messages: machine learning analysis of facebook posts. J. Med. Internet Res.; 2023; 25, [DOI: https://dx.doi.org/10.2196/42863] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36780224][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972210]
16. Yoon, S et al. Analyzing topics and sentiments from twitter to gain insights to refine interventions for family caregivers of persons with Alzheimer’s disease and related dementias (ADRD) during COVID-19 pandemic. Stud. Health Technol. Inform.; 2022; 289, pp. 170-173. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35062119][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8830611]
17. Parker, D; Mills, S; Abbey, J. Effectiveness of interventions that assist caregivers to support people with dementia living in the community: a systematic review. Int. J. Evid. Based Healthc.; 2008; 6, pp. 137-172. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/21631819]
18. Holliday, AM; Quinlan, CM; Schwartz, AW. The hidden patient: the CARE framework to care for caregivers. J. Fam. Med. Prim. Care; 2022; 11, pp. 5-9. [DOI: https://dx.doi.org/10.4103/jfmpc.jfmpc_719_21]
19. Parmar, J et al. Person-centered care for family caregivers of people living with dementia: co-designing an education program for the healthcare workforce. Alzheimers Dement. J. Alzheimers Assoc.; 2021; 17, [DOI: https://dx.doi.org/10.1002/alz.052425]
20. Welcome to ALZConnected! ALZConnected. Accessed November 10, 2024. https://alzconnected.org/categories.
21. Du, Y et al. Diabetes-related topics in an online forum for caregivers of individuals living with Alzheimer disease and related dementias: qualitative inquiry. J. Med. Internet Res.; 2020; 22, e17851. [DOI: https://dx.doi.org/10.2196/17851] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32628119][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7381255]
22. Khurana, D., Koli, A., Khatter, K. & Singh, S. Natural language processing: state of the art, current trends and challenges. Multimed. Tools Appl. https://doi.org/10.1007/S11042-022-13428-4 (2022).
23. Mackey, TK; Purushothaman, V; Haupt, M; Nali, MC; Li, J. Application of unsupervised machine learning to identify and characterise hydroxychloroquine misinformation on Twitter. Lancet Digit. Health; 2021; 3, pp. e72-e75. [DOI: https://dx.doi.org/10.1016/S2589-7500(20)30318-6] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33509386]
24. Blei, DM; Ng, AY; Edu, JB. Latent Dirichlet allocation Michael I. Jordan. J. Mach. Learn. Res.; 2003; 3, pp. 993-1022.
25. Miner, A. S., Stewart, S. A., Halley, M. C., Nelson, L. K. & Linos, E. Formally comparing topic models and human-generated qualitative coding of physician mothers’ experiences of workplace discrimination. BigData Soc. https://doi.org/10.1177/20539517221149106 (2023).
26. Patel, R; Smeraldi, F; Abdollahyan, M; Irving, J; Bessant, C. Analysis of mental and physical disorders associated with COVID-19 in online health forums: a natural language processing study. BMJ Open; 2021; 11, 56601. [DOI: https://dx.doi.org/10.1136/bmjopen-2021-056601]
27. Adelman, RD; Tmanova, LL; Delgado, D; Dion, S; Lachs, MS. Caregiver burden: a clinical review. JAMA; 2014; 311, pp. 1052-1060.[COI: 1:CAS:528:DC%2BC2cXksVaisbk%3D] [DOI: https://dx.doi.org/10.1001/jama.2014.304] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/24618967]
28. Tsao, SF et al. What social media told us in the time of COVID-19: a scoping review. Lancet Digit. Health; 2021; 3, e175.[COI: 1:CAS:528:DC%2BB3sXis1SksLk%3D] [DOI: https://dx.doi.org/10.1016/S2589-7500(20)30315-0] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33518503][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7906737]
29. Hickman, L; Thapa, S; Tay, L; Cao, M; Srinivasan, P. Text preprocessing for text mining in organizational research: review and recommendations. J. Sagepub. Com.; 2022; 25, pp. 114-146.
30. Hauser, TU; Skvortsova, V; De Choudhury, M; Koutsouleris, N. The promise of a model-based psychiatry: building computational models of mental ill health. Lancet Digit. Health; 2022; 4, pp. e816-e828.[COI: 1:CAS:528:DC%2BB3sXis1Sms78%3D] [DOI: https://dx.doi.org/10.1016/S2589-7500(22)00152-2] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36229345][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9627546]
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Informal caregivers of people with Alzheimer’s disease and related dementias (ADRD) are at risk of poor mental health. This study aimed to investigate the feasibility and validity of studying caregivers’ mental stressors using online caregiving forum data (March 2018–February 2022) and natural language processing and machine learning (NLP/ML). NLP/ML topic modeling generated eight prominent topics, which we compared with qualitatively defined themes and the existing caregiving framework to assess validity. Among a total of 60,182 posts, 5848 were mental distress-related; for the ADRD patients (symptoms, medication, relocation, care duty share, diagnosis, conversation strategy) and the caregivers (caregiving burden and support). While we observed novel topics from NLP/ML-defined topics, mostly those were aligned with the existing framework. For feasibility assessment, qualitative title screening was done. The findings shed new light on the potential of NLP/ML text analysis of the online forum for informal caregivers to prepare tailored support for this vulnerable population.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details

1 Stanford University, Department of Medicine, Stanford Center for Digital Health, Stanford, USA (GRID:grid.168010.e) (ISNI:0000 0004 1936 8956); Stanford University, Department of Dermatology, School of Medicine, Stanford, USA (GRID:grid.168010.e) (ISNI:0000000419368956)
2 Stanford University, Department of Psychiatry and Behavioral Sciences, School of Medicine, Stanford, USA (GRID:grid.168010.e) (ISNI:0000000419368956); Veterans Affairs Palo Alto Health Care System, Palo Alto, USA (GRID:grid.280747.e) (ISNI:0000 0004 0419 2556)
3 Stanford University, Department of Medicine, Biomedical Informatics, Stanford, USA (GRID:grid.168010.e) (ISNI:0000 0004 1936 8956)
4 University of California Davis, Department of Computer Science, College of Engineering, Davis, USA (GRID:grid.27860.3b) (ISNI:0000 0004 1936 9684)
5 University of California, Davis, Department of Public Health Sciences, School of Medicine, Davis, USA (GRID:grid.27860.3b) (ISNI:0000 0004 1936 9684)
6 University of Pittsburgh, Department of Health Information Management, School of Health and Rehabilitation Sciences, Pittsburgh, USA (GRID:grid.21925.3d) (ISNI:0000 0004 1936 9000)