Content area
Purpose
This study investigates associations between Facebook (FB) conversations and self-reports of substance use among youth experiencing homelessness (YEH). YEH engage in high rates of substance use and are often difficult to reach, for both research and interventions. Social media sites provide rich digital trace data for observing the social context of YEH's health behaviors. The authors aim to investigate the feasibility of using these big data and text mining techniques as a supplement to self-report surveys in detecting and understanding YEH attitudes and engagement in substance use.
Design/methodology/approach
Participants took a self-report survey in addition to providing consent for researchers to download their Facebook feed data retrospectively. The authors collected survey responses from 92 participants and retrieved 33,204 textual Facebook conversations. The authors performed text mining analysis and statistical analysis including ANOVA and logistic regression to examine the relationship between YEH's Facebook conversations and their substance use.
Findings
Facebook posts of YEH have a moderately positive sentiment. YEH substance users and non-users differed in their Facebook posts regarding: (1) overall sentiment and (2) topics discussed. Logistic regressions show that more positive sentiment in a respondent's FB conversation suggests a lower likelihood of marijuana usage. On the other hand, discussing money-related topics in the conversation increases YEH's likelihood of marijuana use.
Originality/value
Digital trace data on social media sites represent a vast source of ecological data. This study demonstrates the feasibility of using such data from a hard-to-reach population to gain unique insights into YEH's health behaviors. The authors provide a text-mining-based toolkit for analyzing social media data for interpretation by experts from a variety of domains.
1. Introduction
Youth experiencing homelessness (YEH) are exposed to a wide range of risk factors and vulnerabilities. Research indicates a greater frequency and severity of mental health problems among this population than in their housed peers (Pedersen et al., 2018) as well as psychological trauma such as abuse, violence, and sexual victimization (Keeshin and Campbell, 2011).
Studies show that YEHs engage in substantially higher levels of substance use compared to housed youth (Moore et al., 2019; Tabar et al., 2020). In total, 69% of YEH meet the criteria for dependence on at least one substance compared to 1.8% of all US adolescents (Baer et al., 2003); 75% of YEH have reported marijuana use, 25% have reported crack use, and 32% have reported using stimulants (Baer et al., 2003; Bousman et al., 2005; Greene et al., 1997; Hadland et al., 2011; Narendorf et al., 2017). These high rates are concerning as substance use is associated with a range of adverse outcomes, including longer episodes of homelessness, physical and sexual victimization, and physical and mental health issues (Bender et al., 2015; Santa Maria et al., 2018). Reducing YEH's substance abuse problems is a critical first step in assisting this marginalized population in overcoming the challenges they encounter.
Although place-based interventions can help the treatment of substance use among YEH (e.g. Xiang, 2013), YEH are typically hesitant to seek help from traditional health practitioners (Barman-Adhikari and Rice, 2011), owing to an inherent skepticism of formal service systems (Hudson et al., 2010). Therefore, other engagement and monitoring measures should be explored.
Studies find that 80% of YEH report using social media weekly (Pollio et al., 2013). One reason for this reliance on social media is that YEH usually lack stable phone and address to maintain social connections. Another reason is that these sites provide users the opportunity to obtain online and offline social capital (Huang et al., 2021), a resource that YEH need for improved outcomes (Barman-Adhikari et al., 2016a). As a result, YEH exchange information, opinions, and feelings on social media sites, leaving valuable digital trace data. Such data represent a novel venue for gaining a better understanding of YEH's social world and present a practical way of intervening with this at-risk group.
While research attempts to assess social media usage and health-risk behaviors among YEH, they lack in two aspects. First, while these studies demonstrate the associations between social media behavior and health-related behaviors such as risky sexual behaviors and substance use (e.g. Barman-Adhikari and Rice, 2011; Hammond et al., 2018; Rice et al., 2010, 2011; Young and Rice, 2011), most of them used the survey method. The survey data are hard to obtain and limited by their reliance on self-reports and retrospective recall. Alternative approaches that can detect potential substance use without the reliance on survey data are worth exploring.
Second, although more recent studies start to mine social media conversations to identify substance use, they often examine discussions by the general public (e.g. Rose et al., 2017) or other youth groups (e.g. ElTayeby et al., 2019; Hammond et al., 2018; Tran et al., 2018), not by YEH. As we discussed earlier, substance use is much higher among YEH and follows a unique pattern that needs a targeted approach to understanding and interventions. The elevated substance use among this population is symptomatic of the social and emotional challenges that these young people face both before and after becoming homeless. This usually means that they have different needs and priorities than other young people. These needs may be reflected in their social media interactions. For instance, research has shown that homeless youth use social media in significantly different ways than non-homeless youth when it comes to communicating their needs, frustrations, financial/social problems, and interacting with others. Hu et al. (2019) examined homeless adults' Tumblr posts and found that they were more likely to contain terms with negative affect than posts made by non-homeless people. These negative feelings are frequently salient indications of substance use engagement (Dorison et al., 2020) and should be investigated further in the context of social media interactions among this group.
This study addresses these gaps by utilizing both YEH's offline health behavior data from surveys and digital trace data from social media to investigate whether there is an association between their social media conversations and substance use behavior. We intend to explore two exploratory research questions. Our first question is: Whether and how do the social media conversations between substance-using and non-using YEH differ in terms of their sentiment and topics? Scholars in substance use research have advocated for the incorporation of emotion theory in studying substance abuse behavior (e.g. Quirk, 2001) because emotional states can be an essential component in drug use (Childress et al., 1994; Cooney et al., 1997) and relapse (Tiffany, 1990). In addition, topics mentioned in users' conversations can reveal potential root causes of substance usage (Curtis et al., 2018). Empirical studies show that within the general population, both the style and the content of the language used by social media users can be related to alcohol drinking (Curtis et al., 2018; Marengo et al., 2019) and vaping (e.g. Allem et al., 2018). Therefore, both theories and empirical studies have suggested that the sentiment and topics in social media conversations may be different between substance users and non-users. We strive to explore whether similar patterns exist for YEH, and what such differences are. An understanding of these differences can help researchers identify the potential root causes of substance use. Given such differences exist, our second research question is: Can we use the sentiment and topics expressed in YEH's social media conversations to detect their substance-using behavior?
We analyzed a total of 33,204 textual Facebook (FB) conversations and health-related survey responses from a group of 92 YEH. We developed a text-analytics-based framework to perform sentiment and topic analyses. We show that sentiment and topic patterns differ significantly between substance users and non-users among YEH. Furthermore, the sentiment and topics in the social media conversation are related to certain types of substance use.
We contribute to the literature in three ways. First, we provide the first descriptive analysis of YEH's social media conversations regarding sentiment and topics. The results reveal the difference in the conversation patterns between substance users and non-users in this group. These differences provide insights regarding substance-using YEH's mental well-being and triggers for substance use. Second, we establish that the sentiment and topics in YEH's social media conversations are associated with certain substance use behaviors. This finding provides actionable implications for designing effective health surveillance and intervention programs for this hard-to-reach population. Third, we demonstrate how natural language processing techniques can serve as an alternative research method to understand the social context of health behaviors of specific vulnerable groups through digital trace data. Our method offers an alternative monitoring tool that leverages big data and natural language processing and can benefit public health observation applications.
2. Literature review
We review two streams of research that are related to our study. First, we review studies that assess social media use and health-risk behaviors among YEH. Second, we discuss studies that use text mining techniques to analyze digital trace data for the general population's substance use behavior or attitude towards substance use.
2.1 Using social media to understand health-risk behaviors among YEH
Similar to the general young adult population, YEH use social media to stay connected with their peers as well as other family members (Rice and Barman-Adhikari, 2014). Prior studies suggest that more than 80% of YEH use social networking websites such as Facebook and MySpace (Barman-Adhikari et al., 2016b). A group of studies assessing the social media usage pattern and sexual health behaviors among YEH revealed some preliminary insights into how social media is associated with sexual health behaviors among this at-risk population (e.g. Barman-Adhikari and Rice, 2011; Rice et al., 2010; Young and Rice, 2011). These survey-based studies report a relationship between YEH's online social networking behavior and the tendency to seek sexual-related health information and engage in sexual risk behaviors. Rice et al. (2010) find that YEH who are connected with family and home-based peers through social networking sites are less likely to exchange sex and more likely to report a recent HIV test. In contrast, youth connecting to street-based peers online are more likely to engage in exchanging sex for money or to meet their other basic needs. Rice et al. (2012) found that the social connections maintained on Facebook were related to the acceptability of different types of HIV prevention programs. Young and Rice (2011) find that using online social networks for partner seeking is associated with increased sexual risk behaviors among YEH.
More recently, studies have shown that it’s not only the structure or type of social network connections but also the content of such interactions (e.g. email content and conversation topics) have an impact on YEH's health (Barman-Adhikari et al., 2016b; Barman-Adhikari and Rice, 2011). Barman-Adhikari et al. (2016a, b) found that YEH used social media to converse about a range of topics. When they talked about topics such as drugs, drinking, or partying, they were more likely to engage in concurrent sex; when they talked about personal goals, plans, and safe sex, they were more likely to engage in protective sexual behaviors. These scholars, therefore, advocate for the importance of using social media as a resource for social workers to assess this hard-to-reach group and connect them with care.
The studies reviewed above depend on survey data, which is often onerous to collect, especially for this group. Furthermore, the social media usage data obtained from the participants are indirect measures obtained solely via self-report questionnaires. Such data depends on the retrospective recall of the respondents and may not reflect the most accurate information. Nevertheless, these studies have established the connection between social media usage and health-related behavior among YEH. In particular, there is evidence that the topics discussed on social media sites influence the health behaviors of YEH (Barman-Adhikari et al., 2016b). Following this line of work, the current study goes a step further by leveraging observational digital trace data of YEH from social network sites and investigating how their online discussions may be related to offline substance use behavior.
2.2 Text mining social media data for the general population
Researchers in information systems (IS) have used digital trace data from a variety of domains such as virtual collaborations (e.g. Ahuja and Carley, 1999), free/libre open-source software development teams (e.g. Hossain and Zhu, 2009; Long and Siau, 2008), electronic commerce (e.g. Bampo et al., 2008) and open collaboration (Kane, 2009), on a broad range of issues including team performance (Wu et al., 2007), sense-making (Abbasi et al., 2018), productivity (Aral et al., 2012), and disaster response (e.g. Ofli et al., 2016; Son et al., 2019).
Within this body of research, a stream of work utilizes text-mining digital trace data from social media to understand substance use behaviors. Early studies in this field can be grouped into two categories. The first group attempts to use social media posts to explore opinions and attitudes towards substance use. For example, the general public's opinions towards marijuana (Tran et al., 2018), menthol cigarettes (Rose et al., 2017), and alcohol (Riordan et al., 2019) were assessed by mining social media conversations from sites such as Twitter, Instagram, and Facebook. Among these studies, Rose et al. (2017) found that positive sentiment toward menthol cigarettes is dominant among likely smokers, while non-smokers and former smokers express more negative sentiment. Tran et al. (2018) observed more negative words than positive words in general users' emotional reactions to marijuana-related posts on Facebook. Scholars have also examined social media discussions around substance by certain communities. For example, Zhou et al. (2016) found common interests shared by drug users by mining their Instagram conversations. Hammond et al. (2018) reported a generally positive sentiment toward substances by college students. These studies reveal valuable insights regarding motivation for use (Riordan et al., 2019) and use patterns (Zhou et al., 2016). Xie et al. (2021) used a deep learning approach to discover opioid use disorder treatment barriers from patient narratives on social media sites.
The second group of studies focuses on identifying substance-related posts on social media (e.g. ElTayeby et al., 2019; Roy et al., 2017). ElTayeby et al. (2019) built a support vector machine model to predict whether a Facebook post is drinking-related or not, using the term frequency-inverse document frequency vector of the posts. Their model reached the F1 score of 0.72. Roy et al. (2017) used image and textual features of posts to detect substance use-related social media posts, achieving an accuracy of 90%. Unlike our study, these studies focus on the posts themselves - they attempt to summarize or predict the contents of social media messages and do not concern individuals' offline substance use behaviors.
More recent studies start to connect social media posts with offline substance behavior. This group of studies aims to use social media texts (along with other features including profile information, images, etc.) to predict substance use behaviors for social media users. Kosinski et al. (2013) and Ding et al. (2017) found that Facebook likes and keywords in posts can predict participants' substance use. Curtis et al. (2018) used keywords and topics from Twitter data to predict county-level excessive drinking. Marengo et al. (2019) performed a similar analysis but at the individual user level. They found a similar relationship between the use of words and an individual's problem drinking: the use of words regarding family, school, and positive feelings were negatively associated with problematic drinking, while coarse words, and words indicating interest in sports events, politics, and nightlife, were more frequently observed among problematic drinkers. Two more recent studies used machine learning techniques to predict individuals' risk for substance use (Hassanpour et al., 2019) or transition from casual use to serious abuse (Lu et al., 2019) based on their social media posts. Wang et al. (2019) applied machine learning techniques to users' posts in an online cessation community to classify an individual's smoking status. While those studies incorporate offline substance behavior obtained through surveys, they are designed toward the general social media users or housed youth, not YEH. Table 1 summarizes representative studies in this field, the population they studied, as well as their major findings.
3. Theoretical development
We draw from emotion theory and social bond theory to argue why we may observe different language use patterns (sentiment and topics) in the social media messages between substance-using and non-using YEH. We also discuss the value of understanding such different patterns in developing detection and prevention programs.
3.1 Emotion in YEH's social media conversations and substance use
We aim to examine the emotions in YEH's social media conversations and how they can be associated with their substance use behavior. Researchers have been using a variety of theoretical lenses to investigate substance use and addictive behavior, including the genetic perspective, personality perspective, and learning-based perspective. More recent research in this field has turned its attention to the emotion theory perspective (Quirk, 2001) and demonstrated the causal role emotional states play in substance use behavior. For example, studies have documented a high level of emotional instability among substance abusers (Barnes, 1983; McCormick et al., 1998), as well as the connection between emotional states and substance use (e.g. Childress et al., 1994; Cooney et al., 1997; Tiffany, 1990).
In particular, negative emotion has been observed to be associated with substance use behavior, the inability to withdraw, and the tendency to relapse. For example, studies found that negative mood is associated with a craving for alcohol (Childress et al., 1994; Cooney et al., 1997). This association may be explained by the common belief in the ability of substances to alleviate negative moods and reduce stress (Cooper et al., 1992, 1995). Negative emotions can also be associated with continuous use and potential relapse (e.g. Cooney et al., 1997; Tiffany, 1990). Cooney conducted a lab experiment with men with alcoholism and reported that negative emotion-inducing cues can trigger the participants' desire to relapse. Tiffany (1990) reported that the negative emotional states can interfere with a conscious effort to interrupt automatic drug use behavior, therefore leading to continuous use or relapse.
Based on the emotion theory and empirical studies, we argue that the sentiment in users' social media messages can serve as cues for substance use. We hypothesize that the emotions in such conversations would be significantly different between users and non-users among YEH. Furthermore, because of the established relationship between negative emotions and substance use, we expect to see a more negative sentiment in substance users' conversations in this population.
3.2 Topics in YEH's social media conversations and substance use
The second component we aim to examine is the topics in YEH's social media messages and their relations with substance use. Psychoanalytic theory has suggested the relations between the content of our conversations and our social behaviors (Rapaport, 1960). The words people use in their daily lives reveal important aspects of their social and psychological worlds (Pennebaker et al., 2003). An understanding of the content (keywords, topics, etc.) differences of social media messages between substance-using and non-using YEH provide two benefits. First, these differences may reveal the root causes or triggers of substance usage. Such knowledge can assist social workers and agencies to develop more efficient and targeted intervention methods. Second, such differences can be served as cues for substance use detection for this group. As reviewed earlier, empirical evidence shows the relationship between social media post topics and substance use among general social media users. Curtis et al. (2018) reported a relationship between language use on Twitter and county-level excessive alcohol consumption. They found that excessive alcohol consumption was positively correlated with topics such as sports events, music, art, and food-related festivals. They argued that alcohol consumption is considered socially acceptable at such events, making these events attractive to groups that tend to engage in binge drinking. This pattern is also observed on Facebook when Marengo et al. (2019) reported that words indicating interest in sports events and nightlife were more frequently observed among problematic drinkers. With this information, public health interventions targeted at reducing access or deterring excess alcohol consumption at such events may be implemented.
A negative relationship between topics and substance use may also reveal the root causes of use and potential intervention methods. Studies have found that the lack of social bonds and social support can lead to substance use among adolescents (Ensminger et al., 1982; Wills and Vaughan, 1989). Social control theory defines social bonds as an individual's attachments, commitments, involvements, and beliefs in social institutions (Hirschi, 1969). Social support theory defines social support as tangible and intangible aids one can obtain from his or her social networks (House, 1981). As such, we may observe a lack of presence of social bonds in substance users' posts. Marengo et al. (2019) found that the use of words regarding family, school, and positive feelings were negatively associated with problematic drinking. This means that excessive drinkers tend to talk about friends and family less often than their peers, which may indicate a lack of social bonds and social support. Programs that assist substance users in this group to form and maintain social bonds may help reduce their risk of using.
In summary, both theories and past research suggest that topics in social media content can be related to substance use behaviors, at least for general social media users. We built on these studies and hypothesized that there are significant differences in terms of topics expressed in social media conversations between substance users and non-users among YEH. In the following sections, we describe how we explore such differences.
4. Data and methodology
4.1 Data description
We recruited youth experiencing homelessness at a non-profit organization located in the Ballpark neighborhood of downtown Denver between July 2017 and March 2018. Recruiters were present at the agency for over six months, for the duration of service provision hours to approach and screen youth and invite participation. Youth who were interested in the study were screened for eligibility and whether they owned a Facebook profile for at least a year. For youth who met eligibility criteria, we sought informed consent for participation and obtained their FB account information. We obtained two types of data: (1) we collected participants' social media conversations for the past year, including their FB posts and comments. (2) we asked participants to complete a survey on their demographic information, health conditions, sexual behavior, and substance usage behavior. After removing Facebook posts and comments without any meaningful textual messages [1] and participants who did not finish the surveys, we obtained 33,204 FB conversations (posts and comments) authored by 92 survey participants. Table 2 summarizes the general aspects of the survey data and its participants' Facebook conversations (posts and comments).
4.2 Sentiment analysis of facebook conversations
Sentiment analysis (SA) is the computational detection of emotions and sentiments in texts (Pang and Lee, 2004). We used Valence Aware Dictionary and sEntiment Reasoner (VADER) [2], a lexicon and rule-based sentiment analysis tool developed in Python, to perform sentiment analysis on our dataset of Facebook conversations. To calculate the sentiment intensity expressed in each Facebook conversation, we first detected words in the conversation that can have a sentiment orientation from the texts, by using VADER's sentiment lexicon. VADER's sentiment lexicon comprises lexical features such as words, punctuations, phrases, and emoticons, each assigned with a valence score (Hutto and Gilbert, 2016). A valence score describes the degree of sentiment intensity, from most negative (−1) to most positive (+1) [3]. The reason we chose VADER is that it is attuned to sentiments expressed in social media texts (Hutto and Gilbert, 2016) by including lexicons such as emoticons, slang, and abbreviations, which are common in social media texts (Hutto and Gilbert, 2016). Then an overall sentiment score was computed for a Facebook conversation by summing the valence scores of all the words detected within the conversation, adjusted according to grammatical and syntactical rules such as negation and degree intensifiers [4], and then averaged and normalized between −1 and 1.
We ran SA to the 33,204 FB conversations authored by our survey participants. Their average sentiment score is 0.094, with a standard deviation of 0.443, indicating a slightly positive general sentiment among posts and comments by our participants. Figure 1 shows the distribution of the sentiment scores of FB conversations authored by our participants.
4.3 Topic modeling of facebook conversations
It is important to identify the topic of a Facebook conversation because we strive to find some significant or discernible relationships between any topic in Facebook conversations and specific offline behaviors of participants identified in their survey. To extract topics expressed in the Facebook conversations, we need to specify: (1) a set of underlying topics that are often mentioned by our participants' Facebook conversations; and (2) a set of descriptors representing a topic. A descriptor can be a term, a type of term [5], or a regular expression rule [6]. In the process of topic mining, each Facebook conversation will be identified as a topic if the conversation matches the topic's descriptors. For example, if a Facebook conversation mentions the phrase “my best friend,” which matches the descriptor “best friend” for the topic “friend”, then the conversation is classified as expressing the “friend” topic. Note that just as a Facebook user can mention more than one topic when the user posts or comments, each conversation can be classified into multiple topics.
To perform topic mining, we first utilized IBM SPSS Modeler [7] and its built-in “Thoughts and Feelings” text analysis package [8] to process the 33,204 Facebook conversations. We chose this package because its topics are most suitable to our contexts. Second, we observed the participants frequently used hashtags in their Facebook conversations. Such hashtags were added by users to mark the theme or topic of their posts. Since those hashtags often present the underlying topic, we used them, and the predefine topic sets provided from the last step, as our candidate seed topics. Third, we used the text link analysis and text clustering analysis functions offered by the IBM SPSS Modeler to identify more descriptors for topics. This effort enabled us to find how strongly descriptors emerge or are related to each other within a particular topic or across topics. More specifically, the text link analysis identifies co-occurrences of terms or types of terms in a Facebook conversation.
Similarly, the clustering analysis is to group terms that are mentioned together within a Facebook conversation. Table A1 in the appendix provides examples of descriptors identified by the text link analysis and text clustering analysis, what topic the descriptors are assigned to, and the frequency of those descriptors in the entire Facebook conversations in our dataset. These analyses were then probed for descriptors based on their frequency and their subject. If a frequency pattern of descriptor co-occurrences in the entire Facebook conversations does not fit any of the predefined topics, we created a new topic for the pattern. Finally, we used IBM SPSS Modeler's built-in function to automatically extend the identified descriptor sets under each topic based on three linguistic-based methods: concept inclusion, concept derivation, and semantic networks, to automatically derive a more complete set of descriptors. Table A2 in the appendix provides illustrative examples of each method.
We performed this process iteratively until no new pattern emerged. This procedure resulted in a final set of 31 topics. Table 3 summarizes the top ten most frequent topics, their exemplar descriptors, exemplar messages, as well as the number of documents that were classified by the topic [9]. Note that a post can have multiple topics (Zhao et al., 2011). Commonly used topic modeling methods, such as Latent Dirichlet allocation (Blei et al., 2003), are developed under this assumption. This is congruent with prior research. For example, Wang et al. (2013) found that 50% of FB status updates have more than one topic.
4.4 ANOVA analysis
To examine patterns in the Facebook conversations by the level of particular substance use (offline behavior with a certain substance) of the participants, we grouped the 33,204 Facebook conversations from 92 participants by the dichotomized substance use level: substance users and non-substance users, based on their survey responses. Specifically, we investigated whether the patterns in the Facebook conversations between a group of users of a certain substance and the group of non-users significantly differ from each other concerning two aspects: (1) the overall sentiment, and (2) the topics. We then performed an analysis of variance (ANOVA) test to show statistical evidence if any difference is meaningful by a type of substance. We conducted this investigation for seven substances: alcohol, cocaine, heroin, ecstasy, marijuana, crack, and meth. Table A3 in the appendix lists the questions and answer codes for the seven substances.
4.5 Logistic regression analysis
Finally, we attempted to predict users' offline substance use behavior (reported in surveys) by using sentiment and topics expressed in their social media conversations (collected from Facebook). We constructed a dataset that included respondents' FB conversation information, as well as their demographic information and substance usage behavior information reported in their survey answers in the following steps. First, we aggregated the 92 respondents' 33,204 FB conversations at the respondent level. For each respondent, we calculated the average sentiment score of all his/her conversations, the average conversation length (in characters), the total number of conversations, as well as the proportion of each of the topics mentioned in his/her conversations. Second, we combined these data with respondents' information such as age, gender, whether the respondent is currently attending school, etc. Five of these respondents were omitted due to missing values in their gender or age information. We were left with a dataset of 87 respondents. Third, for each of the seven substances, we developed two or three consumption levels: non-usage (0), low/moderate usage (1), and high usage (2), based on the distribution of data. How each level is defined for each substance is summarized in Table 4. For each substance, we assigned each respondent to one consumption level based on the number of times she had consumed the substance in the past 30 days before taking the survey. The summary statistics of this dataset are in Table 4.
We ran ordered logistic regression for each of the seven substances, using the substance usage level as the dependent variable, and the respondent's FB conversation characteristics as independent variables. We included demographic information such as age, gender, and education as control variables. To test the goodness of fit of our models, we reported measures including Cox and Snell, Nagelkerke, and McFadden measures of R2. In addition, we followed Fagerland and Hosmer (2017) to perform the ordinal Hosmer–Lemeshow (HL) test.
5. Results
5.1 Sentiment comparisons among survey respondents' FB conversations
The average sentiment score of all 33,204 FB conversations by the survey respondents is 0.094 with a standard deviation of 0.443, indicating slightly positive sentiment. We further examined how the sentiment in the conversations would differ by substance users and non-users across different types of substances using ANOVA analysis. Table 5 summarizes the results of ANOVA tests for the seven substances. The results in Table 5 indicate that sentiments in the Facebook conversations posted by substance users and non-users can be significantly different (less than 5% level of significance) for all seven substances. Note that on average, for alcohol, marijuana, cocaine, and ecstasy, the non-users expressed more positive sentiments in their Facebook conversations than the users; while, for crack, heroin, and meth, the users showed more positive sentiments in their conversions than non-users. Figure 2 presents the sentiment comparison between non-users and users, grouped by substances.
Figure A1 in the appendix shows the conversation sentiment comparison among all groups of respondents, based on their consumption level of all seven substances.
5.2 Topic comparison among survey respondents' FB conversations
Table 6 summarizes the top ten topics mentioned by non-users and users of all substances, the percentage of Facebook conversations that were classified for each topic, and the p-values of ANOVA tests of proportion.
Table 6 indicates that in general, non-users tend to discuss entertainment, life, work, and religion more often than users, with a significant difference. On the other hand, users tended to discuss money and finance significantly more often than non-users, except for meth users. Figure 3 presents a graphic comparison of topics mentioned between non-users and users, organized by substances.
5.3 FB conversation and substance use
To examine whether sentiment and topics in FB conversations can predict substance use, we ran ordered logistic regression at the respondent level, using the usage level of each of the seven substances as the dependent variable. The results are summarized in Table 7.
Overall, Table 7 shows that there is a significant relationship between FB conversations and the usage levels for two substances: alcohol and marijuana. For the other types of drugs, we did not observe a significant association between YEH's FB conversations and their usage level.
The sentiment in the FB conversation is significantly related to the marijuana usage level. The negative coefficient (−5.48) indicates that the higher the sentiment, the less likely the respondent will be in a high-usage group. McFadden's R2 of 0.23 suggested a good model fit (McFadden, 1977). Furthermore, according to the p-value of the ordinal HL test, we found no evidence of a lack of fit for this model. Therefore, a more positive overall sentiment in a respondent's FB conversation suggests a lower likelihood of marijuana usage. This is consistent with our findings in Table 5, where we observed that the FB conversations of non-users of marijuana have a significantly higher sentiment score than users (0.138 vs 0.081). In terms of topics in the conversation, the Money and Finance topic is significantly related to both alcohol usage and marijuana usage. The positive coefficients (40.63 for alcohol, 56.10 for marijuana) indicate that the more the respondent mentions a finance-related topic in his/her conversation, the more likely he/she will be in a high-usage group for both substances. This finding is partially consistent with the findings in Table 4, where we found that FB conversations of marijuana users mention money and finance-related topic significantly more often than non-users, while there is no significant difference between non-users and users of alcohol.
6. Discussion and implications
Substance use is a significant public health concern among YEH which can potentially lead to other health-related problems (e.g. risky sex behaviors, mental problems, and sexually transmitted diseases). Thus, researchers need to provide insights and implications regarding substance prevention to policy makers and practitioners. This study tackles this problem by leveraging big data on social media platforms. We asked how social media conversations differ between substance-using and non-using YEH, and whether we can use such differences to detect substance use among YEH. We found answers and evidence for both questions. In the following, we discuss our findings and implications.
6.1 Sentiment and topics of YEH's facebook conversations
We found that, on average, the sentiment of all FB posts and comments that are authored by our survey respondents is positive. A similar trend has been observed among housed youth: Lin et al. (2014) applied sentiment analysis to FB status updates of 230 undergraduate students (of similar age) from two universities and found participants disclosing more positive emotion than negative emotion. Liu et al. (2017) analyzed 1879 tweets from 121 first-year freshman students and identified more positive tweets. Our study shows that YEH do not necessarily show a more negative sentiment on social media sites than their housed counterparts.
We identify 31 topics in the FB conversations by YEH. The most frequently mentioned topics include family and friends, appearance, drugs, appreciation, entertainment, money and finance, life, work, food, and religion. We compare this list of topics with topics reported in survey responses from our participants. We found that 32.5% of the participants reported talking about drugs, 26.6% reported talking about sex, 26.2% reported talking about school and/or work, 24% reported talking about family issues, 23.9% reported talking about being homeless, 5.3% reported talking about goals and 7.2% reported talking about safe sex. While some of these topics are common, especially the focus on family and drugs, we were able to unearth unique salient topics of discussion through the use of digital trace data that would not be captured through predetermined survey questions. For example, money and finance is one category that seems to be very important for this group of young people, whereas talking about sex or safe sex does not seem as pertinent, which is contrary to studies using survey data. This likely underscores, as we noted above, the methodological and substantive benefits of using digital trace data.
6.2 Sentiment and topic differences between substance users and non-users
To answer our first research question, we compared the sentiment and topics in FB posts between substance-using and non-using YEH and made the following observations.
First, we found that the sentiment expressed in FB messages differs significantly between users and non-users for all seven substances. As we expected, for alcohol, marijuana, cocaine, and ecstasy, the users expressed more negative sentiments in their conversations than the non-users. This is in line with research that reported the relationship between negative emotions and substance use (e.g. Cooney et al., 1997; Tiffany, 1990). In addition, we confirmed that the relationship between negative feelings in social media content and problematic drinking in the general population (Curtis et al., 2018; Marengo et al., 2019) also existed among the YEH, and applied to certain other substances.
These results provide social scientists with an understanding of the mental well-being of substance-using YEH. Prior studies have shown that sentiment-related indicators from one's social media texts can relate to their health characteristics such as mental wellbeing (e.g. Chau et al., 2020; De Choudhury et al., 2013; Guntuku et al., 2017; Marengo et al., 2019; Moreno et al., 2012; Schwartz et al., 2014). Frequent expression of negative emotions in status updates can identify individuals experiencing depression. The observed more negative sentiment expressed by YEH who are using certain substances may indicate a higher risk of experiencing negative mental health, compared to their non-using peers. On the other hand, we found that for crack, heroin, and meth, the users showed more positive sentiments. It is difficult to speculate the reasons for the mixed sentiment. We need more qualitative information from the participants to explain this discrepancy. Follow-up interviews can be conducted to reveal why YEH's overall sentiment expressed on social media sites changes when they use different drugs. This discrepancy may indicate users' different emotional regulations and reactions to different types of substances (De Arcos et al., 2005).
Second, we found that the topics mentioned in the conversations are significantly different between the two groups. In general, non-users discussed gratitude and appreciation significantly more often than users, except for meth. This pattern is consistent with similar studies that investigate problematic drinking behaviors (Curtis et al., 2018; Marengo et al., 2019) but extends to other substances. This may indicate that non-users have the social bond and support they need, and thus present a higher level of belongingness and trust (Baumeister and Leary, 1995) and a lower level of need for substances (Curtis et al., 2018). This may emphasize the essential role of social support in preventing or reducing substance use among adolescents (Ensminger et al., 1982; Wills and Vaughan, 1989). Intervention programs and social media campaigns should consider incorporating social support and appreciation-related languages to promote feelings of positivity and belongingness.
Non-users also tend to discuss entertainment, life, religion and work more often than users. This pattern is similar to Barman-Adhikari's early study (2016b), in which the authors report that when YEH talk about personal goals and future plans, they are more likely to engage in positive health behaviors. This can also indicate that stable jobs or plans for pursuing stable jobs can reduce the risk of substance use among YEH (Zhang and Slesnick, 2018). Intervention or transition programs that aim at assisting YEH with their job seeking may be implemented to reduce potential substance use.
On the other hand, substance-using YEH tend to discuss money and finance-related issues more often than non-users (except for meth users). These results echo the observations made by Marengo et al. (2019), who found that in youth's FB messages, words related to economics are more frequent among problematic drinkers. Our study confirms similar patterns within the YEH community, with a variety of substances. One possible explanation can be that homeless youth need more money to pay for the substances, or they are more financially disadvantaged due to substance use. This financial hardship, if exists, can potentially lead to criminal behaviors (Baron, 2007). As we noted above, such patterns may reveal root causes of substance use, or conditions that may accompany substance use behaviors, thus providing potential intervention opportunities. Therefore, it is worthwhile to examine this issue of users among YEH.
6.3 Detection of substance use
To answer our second research question, we used sentiment and topics in FB messages to predict the usage level for the two substances: alcohol and marijuana. We found that the more positive a YEH's FB conversations are, the less likely they will be in a high-usage group for marijuana. In terms of topics mentioned in the conversations, the more the youth mention a finance-related topic, the more likely they will be in a high-usage group for alcohol and marijuana.
Identifying homeless youth with a high risk of substance use and investigating their social media conversations are critical steps for social workers and health professionals to provide assistance and interventions. However, this work is extremely challenging due to the hard-to-reach nature of this group, as well as the demanding resource needed if the process is conducted manually (Chau et al., 2020). We identified cues such as topics mentioned in one's social media conversations that social workers can pay attention to, for proactive and early intervention. This information is relatively easier to access than surveys; however, experts should be mindful of privacy issues that can arise with such applications.
6.4 Contributions
Although exploratory, this study provides important implications for researchers and practitioners. First, we contribute to the stream of research on YEH's mental welling by providing a descriptive analysis of YEH's social media conversations. Our findings reveal some previously unknown issues and concerns YEH may face in their daily life. Such information can be incorporated in follow-up interviews and surveys so that social workers and researchers can better understand YEH's situations. Second, we extend the existing literature on the association between social media behavior and substance usage by confirming similar patterns in the highly vulnerable group of YEH. We observe patterns in YEH's social media conversations that can be connected to substance use. Implications we draw from these patterns provide researchers a better understanding of the emotional states of substance-using YEH, and the potential causes and triggers of their substance use behavior. Third, we build a model that can predict certain substance use behavior using sentiment and topics expressed in YEH's social media conversations. Our findings can be used to develop online screening instruments for the identification of YEHs at risk for substance use, which can support proactive and early intervention. The traditional way of identifying YEH's substance use behavior through the survey method has its limitations, making it hard to assess YEH's health status. There are already studies that apply machine learning techniques to social media conversation as an exploratory tool for detecting abnormal behaviors and mental issues (e.g. Chau et al., 2020; Hwang et al., 2020; Kumar et al., 2022). Our findings support the feasibility and validity of studying YEH's health behavior through mining digital trace data from social media sites without depending on survey data. The text analytics-based toolkit we provide in this study can automatically detect sentiment and opinion from social media sites, which can be subsequently reviewed and analyzed by experts from different research backgrounds.
6.5 Limitation and future research
As with any other study, this study is not without limitations. The first limitation is the relatively small sample size. As this is an exploratory study of identifying YEH's risky behavior-related features in social media conversations, such features are worth exploring with larger samples in future research. The second limitation is the possibility of social desirability bias related to participants' self-reported data. Participants may not accurately report their substance use behaviors or feel comfortable sharing certain information with interviewers. Participants were reminded that the data were confidential and were encouraged to ask questions while completing the questionnaire to minimize such invalid data. Additionally, the use of computer-assisted self-interviews, which was employed, have been documented to mitigate issues of social desirability and impression management (Schroder et al., 2003), and lead to more honest and unbiased responses. More importantly, they were assured that the data would not be shared with any law enforcement or the non-profit agency that they were getting services at so that they felt more at ease about providing us honest answers. Last, but not least, we want to be careful about not overstating the implications of the study. While we aimed to understand the potential of using social media conversations as a way of understanding substance use among this population, we still relied on survey data to validate and triangulate our findings.
For future work, it is important to consider how the findings of this study can be applied to substance use prevention in real-world settings (i.e. non-profit agencies serving homeless youth). One option is to consider engaging Facebook in efforts to use such algorithms to flag users' substance use behavior. Facebook already uses its own algorithm to detect suicidal ideation. However, such efforts by Facebook, have recently become mired in controversy because of concerns about privacy, transparency, and ethical issues. A potential alternative to engaging Facebook would be to create a tool that is less likely to violate such privacy and ethical standards. In addition, since research has shown that online social networks are effective in promoting health behavioral changes (e.g. Song et al., 2019), future research can explore the potential to integrate such screening tools into social network-based substance intervention programs that leverage such contagion effects.
7. Conclusion
Scholars have challenged the IS field to utilize information systems and technologies to address human values such as public health, well-being, and social equality (Venable et al., 2011). Specifically, social media analytics can play a pivotal role to inform public health professionals and develop public health programs and policies (Zhang et al., 2020). To help address YEH's substance use issues and improve their well-being, our study shows the feasibility of using digital trace data from social media to analyze the health behavior of this usually hard-to-reach population. The traditional way of using survey data to reach and assess this group has its limitations. We provide an alternative tool for discovering insights into YEH's sentiment and opinions on social media, as well as detecting certain drug usage, which may benefit public health surveillance and substance prevention applications and programs.
Funding: This study was funded by Professional Research Opportunities for Faculty (PROF), University of Denver.
Notes
1.We removed comments or posts containing only web links or less than 4 characters.
2.https://github.com/cjhutto/vaderSentiment?source=post_page
3.A full list of this lexicon can be found at https://github.com/cjhutto/vaderSentiment/tree/master/vaderSentiment
4.Booster words, such as “extremely”, “marginally” that impact sentiment intensity by either increasing or decreasing the intensity.
5.SPSS Modeler's built-in resources help group similar words into types. For example, the “Positive Attitude” type has 484 words which are all related to positive attitude. Example words are “caring”, cheerful”, “friendly” and “responsive”.
6.A rule is a logical expression using extracted concepts, types, and patterns as well as logical operators.
7.IBM SPSS Modeler is a data mining and text analytics software application from IBM.
8.Text analysis packages are built-in templates which contain pre-built topic sets, their descriptors, as well as linguistic resources. Those packages are used to classify texts from different business domains such as brand awareness and employee satisfaction.
9.The full list of topics and their descriptors is available upon request.
Figure 1
Distribution of sentiment scores of 33,204 FB conversation authored by participants
[Figure omitted. See PDF]
Figure 2
Facebook conversation sentiment comparision between non-users and users by substances
[Figure omitted. See PDF]
Figure 2
Topic comparison between non-users and users by substance consumption
[Figure omitted. See PDF]
Figure A1
Conversation sentiment comparison among substance use groups
[Figure omitted. See PDF]
Summary of representative studies
| Literature | Substance studied | Textual features used | Offline substance use prediction? | Population | Findings/Focus |
|---|---|---|---|---|---|
| Zhou et al. (2016) | Weed, cough syrup, prescription pills | Keywords | No | Instagram users | Posts that contain drug-related words reveal common interests shared by drug users |
| Hammond et al. (2018) | Alcohol, marijuana, tobacco | Sentiment and topics | No | College students | College students' sentiment towards substances, and topics around substance use on social media |
| Rose et al. (2017) | Menthol cigarettes | Sentiment and topics | No | Twitter users | General public opinions toward menthol cigarettes |
| Riordan et al. (2019) | Alcohol | Keywords | No | Twitter uses | Twitter users express intention to blackout due to celebration or coping reasons |
| Tran et al. (2018) | Marijuana | Sentiment | No | Facebook users | Users' emotional reactions to marijuana-related posts on Facebook |
| ElTayeby et al. (2019) | Alcohol | Word frequency | No | College students on Facebook | Drinking-related posts detection on Facebook |
| Roy et al. (2017) | Substance | Word embeddings | No | Instagram users | Substance related posts detection on Instagram |
| Lu et al. (2019) | Various substances | Linguistic features, word associated emotions | Yes | Reddit users | Prediction of a user's transition into drug addiction |
| Ding et al. (2017) | Tabaco, alcohol, drug | Keywords, word embeddings | Yes | Facebook users | Keywords in one's posts as well as “likes” towards others' posts are related to a user's substance use |
| Marengo et al. (2019) | Alcohol | Topics and keywords | Yes | Facebook users, mainly college students | Prediction of problem drinking based on linguistic features in one's Facebook texts |
| Curtis et al. (2018) | Alcohol | Topics and keywords | Yes | Twitter users | Prediction of county-level excessive drinking behavior using keywords and topics |
| Hassanpour et al. (2019) | Tabaco, alcohol, drug | Word embeddings | Yes | Instagram users | Prediction of individuals' risk for alcohol, tobacco, and drug use based on the content from their Instagram profiles |
| This study | Alcohol, marijuana, cocaine, crack, heroin, meth, ecstasy | Sentiment and topics | Yes | YEH | Topics and sentiments expressed in YEH's social media messages are related to their substance use behavior, but such relationship differs for different substances |
Summary of survey and survey participants' facebook conversation data
| Data sources | Characteristics | Mean | Std Dev | Min | Max |
|---|---|---|---|---|---|
| From Survey (92 observations) | Age | 20.67 | 1.95 | 18 | 24 |
| Facebook messages (posts and comments) | 360.91 | 445.93 | 1 | 2,659 | |
| % Male | 55.4% | ||||
| % Attending school | 17.4% | ||||
| % Currently working | 33.7% | ||||
| From survey participants' | |||||
| Posts (21,179 observations) | Number of characters | 108.53 | 174.07 | 4 | 1,452 |
| Comments (12,025 observations) | Number of characters | 59.71 | 92.93 | 4 | 1,452 |
Top 10 topics and their descriptor examples
| Rank | Topics | Exemplar | Exemplar | # Of FB conversations |
|---|---|---|---|---|
| 1 | Appearance | “good looking”, “attractive”, “ugly” | “Feeling and looking great. EMOJI_grinning_squinting_face” | 2,903 |
| 2 | Family/Friends | “friend”, “family”, “family members” | Be happy who u have in your life today because even if u have only 3 or less friends that shit can be fun crazy like hell. I happy I have had a great summer in meet new people damn but if u are my friend I will go hell in back for u no matter how much u piss me off or something I got u” | 2,270 |
| 3 | Drug | “smoking”, “drunk”, “high” | “Smoking is to remember drinking is to forget I drink to remember; is not that ironic?” | 2042 |
| 4 | Entertainment | “movie”, “concert”, “cartoon” | “This fuckin movie is my new favorite! Shit is intense!” | 1,092 |
| 5 | Appreciation | “thank”, <Appreciation> | “To all that help me out or even tries thanks a lot I appreciate everything anybody does” | 1,061 |
| 6 | Money and Finance | <Budget>, <Bill>, “payday” | “I need that money like that ring I never want. EMOJI_purple_heart” | 866 |
| 7 | Life | “life”, “part of my life” | “How I think life treats everyone but everyone is different” | 801 |
| 8 | Work | “work” | And I have a job as well as I teach guitar on the side record artists and sell beats to make rent and other bills and I dont stay in the house all day | 550 |
| 9 | Food | <Food> | Breakfast this morning | 378 |
| 10 | Religion | “God”, “Bible”, “pray” | God Please Watch Over Us Youngins Out Here. Lay Your Hands Over ALL OF US | 334 |
Note(s): Total number of messages: 33,204
a Some messages are paraphrased, not directly quoted. The meaning has been preserved
Variable summary statistics
| Variables | Description | Mean | Std Dev | Min | Max |
|---|---|---|---|---|---|
| Demographic information | |||||
| Age | Age of the respondent | 20.62 | 1.95 | 18 | 24 |
| Gender | The gender of the respondent identifies him/herself with | 62% male or trans male, 38% female or trans female | |||
| School | Whether the respondent is currently attending school | No = 81.6%, yes = 18.4% | |||
| Work | Whether the respondent is currently working | No = 67.8%, yes = 32.2% | |||
| Travel | Whether the respondent has “travelled”a | No = 44.83%, yes = 55.17% | |||
| Education | Highest level of education the respondent has completed | ||||
| Facebook conversation | |||||
| Sentiment | Average sentiment in the respondent's FB conversations | 0.14 | 0.15 | −0.29 | 0.80 |
| MessageLength | Average length (number of characters) of the respondent's FB conversations | 87.34 | 54.79 | 9.00 | 351.04 |
| MessageNumber | Total number of respondent's FB conversations | 363.93 | 451.91 | 1 | 2,659 |
| MoneyFinanceb | Proportion of the respondent's FB conversations that mentions finance related topics | 0.03 | 0.03 | 0 | 0.25 |
| Substance usage (for the past 30 days) | |||||
| Alcohol | Number of days the respondent consumed 5 or more drinks of alcohol, 0 = 0 days, 1 = 1–19 days, 2 = 20 days or more | 0 = 67.82%, 1 = 25.29% | |||
| Marijuana | Number of times the respondent consumed marijuana: 0 = 0 times, 1 = 1–19 times, 2 = 20 times or more | 0 = 34.48%, 1 = 24.14% 2 = 41.38% | |||
| Cocaine | Number of times the respondent consumed cocaine: 0 = 0 times, 1 = 1–19 times, 2 = 20 times or more | 0 = 88.51%, 1 = 8.05% 2 = 3.45% | |||
| Crack | Number of times the respondent consumed crack: 0 = 0 times, 1 = 1–19 times | 0 = 98.85%, 1 = 1.15% | |||
| Heroin | Number of times the respondent consumed heroin: 0 = 0 times, 1 = 1–19 times, 2 = 20 times or more | 0 = 96.55%, 1 = 2.30% 2 = 1.15% | |||
| Meth | Number of times the respondent consumed meth: 0 = 0 times, 1 = 1–19 times, 2 = 20 times or more | 0 = 82.56%, 1 = 11.63% 2 = 5.81 | |||
| Ecstasy | Number of times the respondent consumed ecstasy: 0 = 0 times, 1 = 1–19 times | 0 = 93.02%, 1 = 6.98% | |||
Note(s): n = 87
a Travelled means the respondent moved by themselves or with friends from city to city after a short period of time
b This is the only topic that is significantly associated with one or more substance usage behavior according to logistic regression analysis. The statistics of other topics are available upon request
Average sentiment scores by substance users and non-users and ANOVA tests
| Substance | Non-users | Users | Difference | Groups | Non-users | Users | Difference |
|---|---|---|---|---|---|---|---|
| Alcohol | 0.114 (0.447) | 0.068 (0.436) | 0.046*** | Marijuana | 0.138 (0.440) | 0.081 (0.431) | 0.057*** |
| Cocaine | 0.097 (0.445) | 0.036 (0.401) | 0.061*** | Crack | 0.094 (0.443) | 0.339 (0.391) | −0.245* |
| Heroin | 0.093 (0.443) | 0.184 (0.407) | −0.091** | Meth | 0.083 (0.440) | 0.144 (0.451) | −0.061*** |
| Ecstasy | 0.098 (0.440) | 0.065 (0.459) | 0.033*** |
Note(s): Cells include mean, standard deviation, and number of observations. ***: p < 0.001, **:p < 0.01, *:p < 0.05
There are 33,204 messages used in this analysis. However, due to missing values in the usage for ecstasy and meth, only a subset of these messages was utilized in the analysis for these two substances
ANOVA test for proportion result for topic: Group by substance consumption
| Rank | Topic | Alcohol | Marijuana | Cocaine | Heroin | Meth | Ecstasy |
|---|---|---|---|---|---|---|---|
| 1 | Appearance | 2.35%*** (9.77–7.42) | −0.62% (8.26–8.88) | 4.56%*** (8.98–4.42) | 1.95% (8.76–6.81) | −1.18%*** (8.56–9.74) | −0.15% (8.72–8.88) |
| 2 | Friends and Family | 0.02% (6.84–6.82) | 0.35% (7.10–6.75) | 0.62% (6.86–6.24) | −2.12% (6.81–8.94) | 0.64%* (6.92–6.27) | 0.00% (6.83–6.82) |
| 3 | Drug | 0.93%*** (6.44–5.51) | 0.17% (6.17–6.00) | −0.40% (6.01–6.42) | 0.08% (6.04–5.96) | 0.82%* (6.17–5.35) | 0.58% (6.12–5.55) |
| 4 | Entertainment | 0.87%*** (3.69–2.82) | 1.41%*** (4.42–3.01) | 1.51%*** (3.39–1.88) | 0.77% (3.32–2.55) | 0.86%*** (3.46–2.60) | 0.22% (3.35–3.12) |
| 5 | Appreciation | 0.04% (3.37–3.32) | 0.28% (3.56–3.29) | 0.86%** (3.39–2.53) | 1.23% (3.35–2.13) | −1.14%*** (3.16–4.30) | 0.55%* (3.43–2.88) |
| 6 | Life | 1.11%*** (2.89–1.77) | 1.23%*** (3.37–2.14) | 0.73%** (2.44–1.71) | −0.58% (2.40–2.98) | 0.26% (2.44–2.18) | 1.33%*** (2.60–1.27) |
| 7 | Money and Finance | 0.27 (2.68–2.41) | −0.84%*** (1.90–2.74) | 0.16% (2.57–2.41) | −2.56%** (2.54–5.11) | 1.04%*** (2.74–1.70%) | 0.04% (2.57–2.53) |
| 8 | Work | 1.30%*** (2.34–1.04) | −0.39%* (1.47–1.86) | 0.81%** (1.82–1.00) | −0.78% (1.77–2.55) | 0.82%*** (1.91–1.09) | 0.90%*** (1.91–1.01) |
| 9 | Food | 0.16% (1.21–1.05) | 0.24% (1.33–1.09) | −0.10% (1.13–1.24) | −0.14% (1.14–1.28) | −0.23% (1.10–1.33) | 0.15% (1.16–1.01) |
| 10 | Religion | 0.56%*** (1.25–0.69) | 0.70%*** (1.55–0.86) | 0.63%*** (1.04–0.41) | 0.57% (1.01–0.43) | 0.50%*** (1.09–0.59) | 0.63%*** (1.10–0.47) |
Note(s): The p-values indicates the signficance of ANOVA tests for the difference between the mean % for non users and mean % for users, for each topic. ***: p < 0.001, **:p < 0.01, *:p < 0.05. The comparison between crack users and non-users was not performed since the number of users is too low
Ordered logistic estimation results
| Variables | Alcohol | Marijuana | Cocaine | Crack | Heroin | Meth | Ecstasy |
|---|---|---|---|---|---|---|---|
| Sentiment | −4.48 (2.73) | −5.48 * (2.42) | −1.33 (2.80) | Cannot concave | 1.88 (3.72) | 2.67 (2.32) | 0.84 (7.10) |
| MoneyFinance | 40.63*** (12.68) | 56.10 *** (14.87) | 20.60 (11.21) | 21.40 (32.77) | 40.14 (19.72) | 22.88 (53.92) | |
| McFadden's R2 | 0.20* | 0.23*** | 0.21 | 0.37 | 0.25* | 0.55* | |
| Cox and Snell | 0.28* | 0.39*** | 0.17 | 0.12 | 0.26* | 0.24* | |
| Nagelkerke R2 | 0.35* | 0.45*** | 0.29 | 0.41 | 0.37* | 0.61* | |
| HL p-value | 0.26 | 0.30 | 0.00 | –a | 0.98 | 1.00 | |
| Log Likelihood | −55.18 | −71.77 | −29.25 | −9.41 | −36.71 | −9.88 | |
| # of obs | 87 | 87 | 87 | 87 | 87 | 86 | 86 |
Note(s): Robust standard errors in parentheses, *p < 0.05, **p < 0.01, ***p < 0.001
Control Variables: Age, Gender, Travel, Education, School, Work, MessageLength, MessageNumber
Exemplar descriptors and topics revealed by different text mining techniques
| Technique | Example descriptor | Topic |
|---|---|---|
| Text link analysis | “best” + “friend” | Family and Friends |
| Clustering analysis | “marriage, “short”, “fear”, “rules”a | Family/Marriageb |
| Frequent term | “friend” | Family and Friends |
Note(s): a Pairwise frequency is calculated for all the terms in the cluster. The highest frequency is 10, between “married” and “short”
b Marriage is a sub-topic under the topic “Family”
Illustrative example of linguistic-based methods
| Mechanism | Original descriptor | Descriptors added | Topic(s) | |
|---|---|---|---|---|
| Concept inclusion | Starts with a term, and identifies all the terms that include it | “friend” | “best friend”, “my friend” | Family and Friends |
| Semantic networks | Extends descriptors by identifying synonyms and hyponyms | “children”+ “good” | “child” + “excellent” | Family and Friends |
| Concept derivation | Groups terms by looking at the endings (suffixes) of each component in a term. | “work” | “working” | Work |
Substance questions and answer codes
| Substance | Survey question | Answers |
|---|---|---|
| Alcohol | During the past 30 days, on how many days did you have 5 or more drinks of alcohol in a row, that is, within a couple of hours of each other? | 1:0 days; 2:1 or 2 days; 3:3–5 days; 4:6–9 days; 5:9–19 days; 6:20–29 days; 7: All 30 days |
| Marijuana | During the past 30 days, how many times did you use marijuana? | 1:0 times; 2:1 or 2 times |
| Cocaine | During the past 30 days, how many times did you use any form of cocaine (including powder, coke, blow, or snow) but NOT crack? | Same as above |
| Crack | During the past 30 days, how many times did you use crack, including freebase or rock? | Same as above |
| Heroin | During the past 30 days, how many times have you used heroin (also called smack, junk, or China White)? | Same as above |
| Meth | During the past 30 days, how many times have you used methamphetamines (also called meth, speed, crystal, crank, or ice)? | Same as above |
| Ecstasy | During the past 30 days, how many times have you used ecstasy (also called MDMA or X)? | Save as above |
© Emerald Publishing Limited.
