Content area
Full text
Social psychologists have discovered that the words people use can give insight into their thought and behavior (Pennebaker et al., 2003). For example, people’s word use reflects their personalities (Pennebaker and King, 1999). Word use can also predict future behavior (Guntuku et al., 2020). Studies have found that depressed college students and poets who later went on to commit suicide used more self-focused language than non-depressed people (Stirman and Pennebaker, 2001; Rude et al., 2004).
Beyond differences between individuals, researchers have also used language to explore differences between regions (Chung et al., 2014). For example, researchers analyzed language use on Twitter and found that people in areas that expressed more negative emotions—particularly anger—had higher rates of heart attacks (Eichstaedt et al., 2015). In another study, sentiment toward the Affordable Care Act (“Obamacare”) on Twitter predicted differences in enrollment across states (Wong et al., 2015). Further, cross-cultural variation in word use has also been explored to study differences in expressions of politeness (Li et al., 2020), temporal orientation (Hou et al., 2024), and psychological stressors (Cui et al., 2022).
These studies suggest language use can give insight into people’s psychology and regional differences. In this study, we analyze over a billion words from Weibo (similar to Twitter) to gain insight into regional differences across China. Similar to Twitter, Weibo posts tend to be short. At the time of our data collection, posts were limited to 140 characters. The median post in our dataset is 14 Chinese characters long. This is similar to the average sentence length in Chinese, according to one estimate (Xi et al., 2022).
On Weibo, people typically post about things they are doing and reactions to news events. For example, a user in Guangdong posted, “I’m not allowed to leave the country to travel for half a year, I’ll stick it out!” A user in Hebei posted, “Wasn’t life just loneliness all along.” Posts are public. In other words, they are not direct private messages like text messages or emails. Not surprisingly for a tech platform, users tend to be younger and more educated than the population as a whole Koetse (2015).
To frame our search, we test categories and constructs that cultural psychology has linked to individualism and collectivism. We also use...