Content area
Traditional studies of semantic prosody, often relying on manual observation of limited concordance lines, face constraints such as small scale, susceptibility to subjective judgment, and difficulties in capturing emotional nuances in broader contexts. This study introduces sentiment analysis to conduct a contrastive investigation of semantic prosody in English and Chinese, with a focus on adverbs expressing completeness. The findings reveal that the specific context or "span" of text significantly influences the observed polarity strength of semantic prosody. Sentiment analysis incorporating contextual and syntactic factors yields more precise and nuanced results. Differences in semantic prosody between translational equivalents in the two languages reflect disparities in lexicalization patterns: English employs a wider range of adverbs to convey subtle emotional distinctions, while Chinese relies on more generalized terms with broader semantic ranges and an overall positive tendency. This research demonstrates the potential of computational methods in semantic prosody and contrastive linguistics, offering an efficient and scalable approach to data-driven analysis.
Abstract
Traditional studies of semantic prosody, often relying on manual observation of limited concordance lines, face constraints such as small scale, susceptibility to subjective judgment, and difficulties in capturing emotional nuances in broader contexts. This study introduces sentiment analysis to conduct a contrastive investigation of semantic prosody in English and Chinese, with a focus on adverbs expressing completeness. The findings reveal that the specific context or "span" of text significantly influences the observed polarity strength of semantic prosody. Sentiment analysis incorporating contextual and syntactic factors yields more precise and nuanced results. Differences in semantic prosody between translational equivalents in the two languages reflect disparities in lexicalization patterns: English employs a wider range of adverbs to convey subtle emotional distinctions, while Chinese relies on more generalized terms with broader semantic ranges and an overall positive tendency. This research demonstrates the potential of computational methods in semantic prosody and contrastive linguistics, offering an efficient and scalable approach to data-driven analysis.
Keywords
Semantic prosody; Sentiment analysis; Corpus linguistics
1. Introduction
Semantic prosody, a key concept in corpus linguistics which refers to the consistent positive or negative semantic atmosphere acquired by a node word through habitual collocation, has garnered significant attention in recent corpus-based research (e.g., Sinclair, 1987; Louw, 1993; Stubbs, 1995; Wei, 2002a; Wei, 2002b; Wei, 2006). It typically describes the phenomenon where seemingly neutral words acquire positive or negative associative meanings due to frequent co-occurrence with specific collocates (Sinclair, 1991, pp. 74-75). Previous research mainly focused on its definition (Louw, 1993; Partington, 1998; Sinclair, 1999; Louw, 2000), methodology of study (Sinclair, 1996; Wei, 2002a), and implications in fields including cross-linguistic translation (Kenny, 1998; Kenny, 2001), second language learning (Xiao & McEnery, 2006; Wei, 2006) and English for special purposes (Wei, 2002b).
Previously, influenced by Sinclair's (1996) concept of "extended units of meaning" and his four-step identification method ("collocation -> colligation -> semantic preference -> semantic prosody"), studies of semantic prosody often explicitly or implicitly rely on the notion of collocation: the semantic prosody of a key word is obtained by observing its collocates within a certain "span" of text from a limited number of concordance lines, or with the combination of a data-driven generation of collocation (Wei, 2002a; Wei, 2002b). These methods face constraints such as small scale, susceptibility to subjective judgment, and difficulties in capturing emotional nuances in broader contexts, which may be beyond the text span or the observed collocates. Firstly, the emotion and attitude conveyed by a word in a specific context are sometimes not only acquired through "contagion" from collocates; in many cases, the emotion may come from a broader context than just collocates (e.g., the sentence, sentence group, or even paragraph). Furthermore, the semantics of the collocates themselves are often influenced by the context. Therefore, the method of examining semantic prosody by extracting collocates from the context (i.e., stripping them of their context) seems to have certain limitations.
The technique of sentiment analysis, which systematically identifies, extracts, quantifies, and studies affective states and subjective information, may come as a remedy for the above weaknesses of the present paradigm. By the use of natural language processing, text analysis, and computational linguistics, the technique can systematically identify, extract, quantify, and study affective states and subjective information, and has been widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. Compared to the attitudinal meaning of discrete collocates of the key word, sentiment is more dynamic and effective in capturing semantic prosody as "a consistent aura of meaning" (Louw, 1993, p. 157), "the spreading of connotational coloring beyond single word boundaries" (Partington 1998, p. 68), "an impression of an attitudinal or pragmatic meaning" (Sinclair, 1999), or "a pervasive semantic atmosphere" (Wei, 2002a, p. 300; Wei, 2002b, p. 166).
This study explores the potential of sentiment analysis methods in effectively identifying and quantifying semantic prosody, with a special interest in the comparison between apparent translation equivalents, aiming to bridge corpus linguistics with computational approaches for cross-linguistic semantic analysis. We aim to address the following questions:
(1) How effective is sentiment analysis in identifying and quantifying semantic prosody, as compared to the collocation-based traditional approaches?
(2) How can the sentiment-based method be further applied to compare the semantic prosody of equivalent words in a crosslinguistic context?
2. Sentiment Analysis
Sentiment analysis is an emerging and continuously developing interdisciplinary research field between linguistics and computer science, and is currently one of the hottest research areas (Feldman, 2013). Sentiment can be expressed through language as a positive or negative evaluation. Sentiment analysis research extracts users' subjective feelings about an object or event from positive and negative words in the text, from the context of these words, or from the linguistic structure of the text, thereby automatically determining the sentiment (or positive/negative attitude) contained in the text (Taboada, 2016; Valenzuela, 2017). Common uses of sentiment analysis include automatically determining whether reviews published online (e.g., on the internet or social media) (movie reviews, book reviews, product evaluations, etc.) hold a positive or negative attitude towards the reviewed object. It is now a common computational tool used by businesses, market developers, and political analysts for analyzing social media. For example, a movie studio can use it to collect all reviews people write about a movie on Twitter or Facebook, extract attitude information from the text's meaning and semantic prosody, and classify the text as a positive or negative review (or, in the field's terms, positive, negative, or neutral sentiment). Since sentiment analysis primarily focuses on the emotional attitude of words and sentences, and semantic prosody research examines the emotion and attitude carried by the collocates of a node word, the relationship between the two seems obvious.
Applying sentiment analysis to the study of literary texts is also a very active research area. Jockers (2014) used sentiment analysis to study the plot types of over 2000 novels, summarizing them into six (or seven) types of plot forms1. Piper and So (2015) studied the contrast between emotional words in best-selling novels and other (e.g., classic) novels, finding that classic works (especially those realistic ones) use significantly fewer emotional words than romantic and best-selling novels. Silge and Robinson (2018) analyzed and compared the sentiment scores in Jane Austen's six novels and the most commonly used emotional words within them, finding that positive emotions absolutely dominate in Austen's novels.
A commonality in the above analyses is that they all rely on sentiment dictionaries to count the number of positive and negative sentiment words in a given text segment and thereby calculate its sentiment value. This method is inherently problematic: First, sentiment dictionaries provide the most likely sentiment value of a word independent of specific context. However, just like its semantic value, a word's sentiment value often varies depending on the context.
To overcome the problems with the simple dictionary-based method, Rinker (2018) used four types of "valence shifters" in the algorithm of his sentiment analysis package sentimentr to reverse or negate the simple word sentiment values obtained using a sentiment dictionary (Raja, 2017; Dey, 2018). These specifically include: negators (e.g., "not", "can't"); amplifiers (e.g., "absolutely", "certainly"); de-amplifiers (e.g., "almost", "barely"); and adversative conjunctions (e.g., "although", "that being said"). Rinker pointed out that in the corpus he examined, about 20% of sentiment words co-occurred with these shifters. Clearly, shifters cannot be ignored in sentiment calculation. Furthermore, sentimentr also uses word order as one factor in sentiment calculation. This is something that simple calculation methods using sentiment dictionaries (e.g., the syuzhet package) fail to do.
3. A Sentiment-based Approach to Semantic Prosody
In the following discussion, we will attempt a sentiment-based approach to the issue by examining the semantic prosody of six English adverbs of completeness ("completely", "entirely", "thoroughly", "totally", "utterly", and "wholly") based on the BNC corpus (100 million words). First, we extract from the corpus all sentences containing the above six adverbs from the corpus. Then, using the sentiment dictionary provided by the R package sentimentr, we assign a value to each sentiment word in each sentence. Drawing on the sentimentr algorithm, we calculate the sum of the sentiment values of all words in each sentence, and then divide it by the square root of the number of words in the sentence to appropriately adjust the sentiment value for each sentence. The results are shown in Figure 1 (where element_id is the sentence ID, sentiment is the sentence sentiment value, and the pos and neg labels and subsequent numbers in the facet labels for each adverb indicate the proportion of sentences showing positive and negative sentiment, respectively):
A comparison of the results with those reported in previous research (e.g., Louw, 1993; Partington, 2004; Wei, 2006) suggests that the proposed approach is plausible.
4. Semantic Prosody in Translated and Original Essays
Using the above analysis for reference and comparison, we next use the sentimentr package to analyze our self-built corpus of English-Chinese essays2. We use the sub-corpus of original English essays to analyze the semantic prosody of the above six adverbs of completeness, use the corresponding sub-corpus of translated Chinese essays to analyze the semantic prosody of their equivalent translation words, and simultaneously use the sub-corpus of original Chinese essays to analyze the semantic prosody of the same words as in the translation case. First, the results of the sentiment analysis of the six adverbs in the English sub-corpus are as follows:
From the limited results shown in Figure 2, we see that the sentiment analysis results of the six adverbs of completeness in the English essay corpus are relatively closer to the results calculated using the BNC above in Figure 1. However, differently, in terms of order, "thoroughly" and "entirely" both fall below "wholly". On the one hand, this may be caused by stylistic differences, and on the other hand, it may also be related to the corpus size to some extent.
Now we want to know whether the words corresponding to the above six adverbs in Chinese texts have similar performance in terms of sentiment. First, we use the pairwise_cor function from the R package tidy text and calculate the Chinese words with the highest correlation to the six adverbs from English-Chinese parallel sentences. The results show the two Chinese words with the highest correlation are "??" (wánquán, primarily for "completely", "entirely", "totally", and "wholly") and "??" (chèdǐ, primarily for "utterly" and "thoroughly").
Next, we want to examine the sentiment of the corresponding Chinese equivalents of the above six English adverbs of completeness in the translated Chinese essays and the original ones. Similar to the sentiment analysis method for the six English adverbs above, we extract all sentences containing "??" and "??" from the two subcorpora, and then calculate the overall sentiment of each sentence3. The results are shown below:
From Figure 3, although there are some differences in the sentiment values of the sentences containing "" and "" between the translated Chinese essays and the original ones, both show a significantly positive tendency (the average proportion of positive sentiment is more than twice that of negative sentiment). That is, both Chinese words carry a clear positive semantic prosody in both corpora. Considering the significant differences in semantic prosody among the six English adverbs in the English essays, we believe this result, on the one hand, confirms the "sanitization" phenomenon of semantic prosody in translated texts proposed by Kenny (1998; 2001, pp. 167-170), where the translated text appears "somehow tamer than the original" (1998, p. 520). On the other hand, it is also an objective result caused by differences in lexicalization between the two languages. Just as the Inuit have over 50 words for snow and 70 for ice, and the Sami have over 1000 words for reindeer, English has more adverbs expressing completeness, allowing for finer nuances of meaning, while the corresponding words in Chinese are relatively limited, and their meanings are consequently relatively fuzzy. Therefore, the two Chinese words we analyzed above can be seen as a fusion of the six English adverbs of completeness, tending towards a more positive overall sentiment.
5. Conclusion
This study demonstrates the efficacy of combining corpus linguistics with computational sentiment analysis in the study of semantic prosody across languages. By analyzing adverbs of completeness in English and their Chinese equivalents, we reveal how semantic prosody is not only shaped by collocational patterns but also influenced by broader contextual factors and language-specific lexicalization processes.
Key findings indicate that while English adverbs exhibit nuanced and varied prosodic profiles, their Chinese counterparts display a stable and predominantly positive prosody in both translated and original texts. This divergence underscores the potential "sanitization" effect in translation, where prosodic sharpness may be neutralized, and highlights typological differences between English and Chinese in the granularity of expressive means.
This study contributes to the fields of contrastive linguistics, translation studies, and corpus-based semantic analysis by offering a scalable, data-driven approach to semantic prosody. Future work may expand to refine sentiment algorithms for finer-grained pragmatic and cultural dimensions.
Funding
This research was supported by the 2025 Zhejiang Fund for Training College Students' Innovation and Entrepreneurship (Project title: AI Watching the Mind: A Large Model-based Monitoring System for Social Media Behavior of University Students; Grant No.: 202410347051).
References
Brosseau, P. (2015). There are only six basic book plots, according to computers. Motherboard. Retrieved November 30, 2018, from https://motherboard.vice.com/en_us/article/8qxkkb/computers-find-that-there-are-six-plots
Ding, G. (2020). A corpus-based study of word class relations in English-Chinese translation. Foreign Language Teaching and Research, 52(5), 773-785.
Ding, G. (2024). Triangulating text relationship in literary retranslation: The Great Gatsby in Chinese. Digital Scholarship in the Humanities, 39(3), 849-863.
Feldman, R. (2013). Techniques and applications for sentiment analysis: The main applications and challenges of one of the hottest research areas in computer science. Communications of the ACM, 56, 82-89.
Jockers, M. L. (2014). A novel method for detecting plot. Retrieved November 29, 2018, from http://www.matthewjockers.net/2014/06/05/a-novel-method-for-detecting-plot
Kenny, D. (1998). Creatures of habit? What translators usually do with words. Meta, XLIII (4), 515-523.
Kenny, D. (2001). Lexis and creativity in translation: A corpus-based study. St. Jerome.
Louw, B. (1993). Irony in the text or insincerity in the writer? The diagnostic potential of semantic prosodies. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and technology: In Honour of John Sinclair (pp. 157-176). John Benjamins.
Louw, B. (2000). Contextual prosodic theory: Bringing semantic prosodies to life. In C. Heffer & H. Sauntson (Eds.), Words in context: A tribute to John Sinclair on his retirement (pp. 48-94). University of Birmingham.
Partington, A. (1998). Patterns and meanings: Using corpora for English language research and teaching. John Benjamins.
Partington, A. (2004). "Utterly content in each other's company": Semantic prosody and semantic preference. International Journal of Corpus Linguistics, 9(1), 131-156.
Piepenbring, D. (2015). Man in hole: Turning novels' plots into data points. The Paris Review. Retrieved November 25, 2018, from https://www.theparisreview.org/blog/2015/02/04/man-in-hole/
Piper, A., & So, R. J. (2015, December 18). Quantifying the weepy bestseller. New Republic. Retrieved November 21, 2018, from https://newrepublic.com/article/126123/quantifying-weepy-bestseller
Silge, J., & Robinson, D. (2018). Introduction to tidytext. Retrieved November 21, 2018, from https://cran.r-project.org/web/packages/tidytext/vignettes/tidytext.html
Sinclair, J. (1987). Collocation: A progress report. In R. Steele & T. Threadgold (Eds.), Language topics: Essays in honour of Michael Halliday (pp. 319-332). John Benjamins.
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford University Press.
Sinclair, J. (1996). The search for units of meaning. Textus, 9(1), 75-106.
Sinclair, J. (1999). Concordance tasks. The Tuscan Word Centre. Retrieved November 2, 2018, from http://www.twc.it/happen.html
Stubbs, M. (1995). Collocations and semantic profiles: On the cause of the trouble with quantitative studies. Functions of Language, 2(1), 23-55.
Taboada, M. (2016). Sentiment analysis: An overview from linguistics. Annual Review of Linguistics, 2, 325-347.
Valenzuela, J. (2017). Meaning in English: An introduction. Cambridge University Press.
Wei, N. (2002a). Methods of research on semantic prosody. Foreign Language Teaching and Research, 34(4), 300-307.
Wei, N. (2002b). A corpus-driven study of semantic prosodies in specialized texts. Modern Foreign Languages, 25(2), 165-175.
Wei, N. (2006). A corpus-based contrastive study of semantic prosodies in learner English. Foreign Language Research, (5), 50-54.
Xiao, R., & McEnery, T. (2006). Collocation, semantic prosody, and near synonymy: A cross-linguistic perspective. Applied Linguistics, 27(1), 103-129.
1 See also the analyses conducted by Brosseau (2015) and Piepenbring (2015) on this matter.
2 For information of this corpus, please see Ding Guoqi (2020).
3 For methods and the sentiment dictionaries used in the calculation of Chinese sentences, please see Ding (2024).
© 2025. This article is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.