Introduction
The analysis of language production has emerged as a potent method for detecting cognitive abnormalities in Alzheimer's disease (AD).1–4 The sensitivity of language in detecting AD extends to several years before the official diagnosis of the disease.5,6 Analysis of short language samples from a picture description task in cognitively unimpaired individuals has a higher accuracy of predicting future AD compared to models based on neuropsychological test scores, demographic variables, and APOE results.5 Typical language abnormalities indicative of AD in English include using a higher rate of pronouns, shorter sentences, and an increased rate of adverbs.2,4,7–10 Despite its evident clinical value, the mechanism through which cognitive impairments of AD affect language production remains poorly understood. Questions also remain about whether the observed language changes are isolated findings or manifestations of a single core cognitive deficit. Understanding the underlying mechanism of language abnormalities in AD has direct clinical applications by enhancing the accuracy of models used for early disease detection and targeting core rather than secondary symptoms during treatment.
To elucidate the origin of language abnormalities in AD, it is crucial to identify the stage in language production where AD-related cognitive impairments emerge to result in characteristic language deficits. Two possibilities can be considered in this regard. The first is that these language abnormalities reflect impairments in cognitive capacities required to establish the surface structures of a language, such as the particular word order, assignment of grammatical gender, or other processes specific to a given language.11,12 For example, the reduced use of a word type in patients with AD (pwAD) might relate to the particular order in which that word type appears in a language, rendering it more vulnerable to elimination. In this scenario, a different language with distinct structural rules may not exhibit similar impairment in the use of that word type. Alternatively, language abnormalities of AD may emerge from a deeper layer of language production where meaning is constructed before language-specific rules are applied. In this context, the language abnormalities of AD would not be bound to a particular language as they relate to a more universal aspect of language production: the formation of an informative message.
To attempt to answer these questions, we performed two stages of analysis. First, we investigated the extent to which language features of AD in one language can be transferred to another language with different surface structures. Specifically, we tested the hypothesis that language features associated with AD in English could reliably classify pwAD versus controls in Persian. Persian (also known as Farsi) is a typologically distant language relative to English, stemming from the Indo-Iranian branch of the Indo-European language family13 (Fig. 1). Unlike English, which follows a subject–verb–object word order (SVO), Persian has a word order of subject-object-verb (SOV).14–16 One immediate outcome of this structure is that there are usually more words intervening between the subject–verb relationship in Persian compared to English (see Fig. 2 for a comparison). In Persian, as in many SOV languages, adjectives are placed after the nouns they modify.17 Persian also features a large, open-ended set of complex predicates comprising a nonverb element (e.g., a noun or adjective) followed by a light verb.18,19 Given the substantial structural disparities with English, Persian presents a compelling candidate language to pursue the objectives of this study. We extracted various language features indicative of AD in both English and Persian using transcriptions from participants as they described a picture. We then tested if the features indicative of AD in English accurately classified pwAD who spoke Persian. Poor generalizability indicates that language abnormalities of AD stem from the inability to maintain surface features specific to English. Conversely, a high degree of transferability to Persian would suggest that the linguistic abnormalities of AD reflect disruptions at a deeper level of language production shared by both languages.
[IMAGE OMITTED. SEE PDF]
[IMAGE OMITTED. SEE PDF]
Second, we hypothesize that the deeper level from which language abnormalities of AD originate is the stage of constructing an informative message. To test this hypothesis, we measured the informativeness of language produced by speakers of both languages and evaluated how informativeness correlates with language abnormalities of AD. To measure language informativeness, we used a novel metric we refer to as the Language Informativeness Index (LII) (see Box 1). Using a Large Language Model (LLM), LII measures the similarity of a target language sample to a highly informative description of the picture used in the language production task. We evaluated whether the typical language abnormalities of AD correlate with the LII measure, spanning both English and Persian. Significant correlations in both languages would provide evidence for the hypothesis that deficits in message formulation represent the pivotal stage at which language abnormalities in AD take root.
Box
Language Informativeness Index (LII) and a Review of Measuring Language Emptiness in AD
A common method of measuring language informativeness, especially in language samples derived from picture description tasks, involves measuring the density of ideas. The literature varies in defining what constitutes an “idea.” Within this method, one approach is to evaluate the presence of specific items relative to language topics, referred to as content units or information units.6,20–23 For instance, when describing the Cookie Theft picture, key content units might include “mother” and “stealing cookies.” In a classic study, Croisile et al. identified 23 information units in the picture.21 This approach involves categorical scoring, assigning either a one or zero, without considering the varying salience of each information unit, such as “stealing cookies” versus “curtains.” Additionally, manual implementation of this method can be highly time-consuming. While automation is possible through word detection from a predefined list,24 the method remains sensitive to specific word choices. There are numerous ways to express a concept, particularly action items. For instance, “stealing cookies,” “snatching sweets,” and “taking cookies without permission” all convey a similar idea, making it challenging to capture all variations using either manual or automated measures.
An alternative method for quantifying idea density involves counting propositions, typically defined as verbs, adjectives, adverbs, or prepositional phrases within every 10 words of a text.25,26 In this approach, parts of speech that make insignificant contributions to the overall meaning still result in an increased informativeness score, as seen with adverbs in this sentence, “The kitchen looks so very messy.”
Other approaches involve measuring word frequency and lexical diversity.27 While these methods gauge the proficiency of patients in producing unique and informative words, they are not sensitive to the specific topic of speech. For example, a participant who employs a sophisticated and varied lexicon while recounting an old memory rather than addressing the required task of describing the Cookie Theft Picture might still receive a high informativeness score.
In this work, we introduce LII to measure language informativeness by measuring the similarity of participants' language samples to a highly informative reference, here a detailed description of the cookie theft picture. We employed an artificial intelligence-based image-to-text tool to generate a reference text, ensuring that it contains at least the classic set of twenty-three information units. Subsequently, we used a Large Language Model (LLM) to measure the similarity between each sample and the reference text (see Methods for more details). This metric allows for a graded scaling of informativeness rather than categorical scoring. Furthermore, LII is sensitive to the specific topic of language without being bound to particular word choices.
Materials and Methods
Participants
The English cohort
We obtained English samples from DementiaBank, a component of the TalkBank project.28 The dataset was collected at the University of Pittsburgh as part of the Alzheimer Research Program between 1983 and 1988 during a 5-year-follow-up, with comprehensive information available in Becker et al.20 Inclusion criteria consisted of being older than 44 years old, having at least 7 years of education, not having previous neurologic disorders, not taking neuroleptic drugs, having at least a score of 10 in Mini-Mental State Exam (MMSE), and being able to give informed consent. In this cohort, Alzheimer's disease was diagnosed using a detailed clinical evaluation and follow-up protocol. At the time of study inception, the criteria from the National Institute of Neurological and Communicative Diseases and Stroke-Alzheimer's Disease and Related Disorders Association (NINCDS-ADRDA) did not exist. Therefore, the diagnosis was based on a comprehensive neuropsychiatric evaluation, including medical history, physical and neurologic examinations, psychiatric interviews, and neuropsychological assessments. This assessment was supplemented by laboratory tests, EEG, and CT scans. Individuals were diagnosed with probable Alzheimer's disease based on a history of progressive cognitive and functional decline and an abnormal mental status examination. The Diagnostic and Statistical Manual of Mental Disorders, Third Edition (DSM-III),21 was used to ensure that major nervous system disorders and other psychiatric conditions were excluded. Each evaluation was conducted over approximately three sessions, typically within a 2-week period. The aim of this thorough assessment was to ensure a meticulously screened and uniformly evaluated cohort of Alzheimer's disease patients and control subjects for a longitudinal study.
From this dataset, we included 103 patients with probable AD and 53 age-matched healthy individuals. The imbalance in sample size between the two groups is because the healthy individuals in the corpus were, on average, younger than the patients. Therefore, we selected a subset of healthy individuals to ensure they were age-matched with the AD patients. Since many participants had multiple language samples through the longitudinal approach, we only included their first sample. PwAD were at the mild stage of dementia with an average MMSE of 18.7 (SD = 5.2).
The Persian cohort
Twenty-five Persian-speaking pwAD and 25 age-matched healthy individuals were recruited from the Brain and Cognition Clinic in Tehran, Iran. A complete clinical history was taken from patients and their caregivers. Demographic features, clinical presentations, medical, and family histories were included in the interviews. A complete clinical examination was performed, emphasizing the assessment of motor features such as Parkinsonism, praxis, language and speech, gait, and balance. The cognitive examination included the Persian versions of Addenbrook's Cognitive Examination (ACE)22 and Mini-Mental State Exam (MMSE).23 Blood tests and the brain MRI of all patients were reviewed to confirm the diagnosis and rule out other medical conditions. The blood tests included complete blood count, biochemistry, renal and liver function tests, vitamin B12, 25-OH-D3, thyroid function test, syphilis, and HIV serologic test. Mild or Major Neurocognitive Disorders and Alzheimer's disease were diagnosed by DSM-5 and NINCDS-ADRDA criteria,24 respectively. PwAD in the Persian cohort were at the mild stage of dementia with an average MMSE of 18.0 (SD = 5.1). Neurotypicality was determined based on a thorough clinical examination of all participants. All but two individuals (92%) had brain MRIs. Two independent radiologists determined the normal status of the brain MRIs. The two healthy individuals who were not able to undergo brain MRI due to logistical reasons had an MMSE of 30 each. The research section of the Persian cohort was approved by the ethics committee of Iran University of Medical Sciences, which governs human subjects research in accordance with their guidelines. All participants provided written informed consent to participate in this study. We certify that the study was performed in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments. This study particularly champions biocultural diversity through the inclusion of participants from two linguistically and culturally distinct cohorts.
The demographic matching began by aligning healthy individuals in the English and Persian cohorts. As a result, there was no statistically significant difference in age (t = −0.75, P = 0.46) or sex ratio (χ2 = 0.02, P = 0.89) between these two groups. We then matched the AD group in each cohort with its corresponding control group, as detailed in Table 1.
Table 1 The demographic and clinical characteristics of pwAD and healthy controls (HC) across the English and Persian cohorts.
pwAD | HC | Statistics (P value) | |
English cohort | N = 103 | N = 53 | |
Age (mean ± SD) | 71.4 (5.2) | 70.2 (4.2) | t = −1.6 (P = 0.12) |
Handedness (percent right-handed) | – | – | |
Sex (percent female) | 62.3% | 68.0% | χ2 = 1.5 (P = 0.22) |
Persian cohort | N = 25 | N = 25 | |
Age (mean ± SD) | 74.9 (4.6) | 71.4 (7.8) | t = −1.9 (P = 0.06) |
Handedness (percent right-handed) | 100% | 96% | χ2 = 1.0 (P = 0.31) |
Sex (percent female) | 72% | 52% | χ2 = 1.7 (P = 0.19) |
Language samples
Connected speech samples were obtained for both languages by asking participants to describe the Cookie Theft picture, a component of the Boston Diagnostic Aphasia Examination.25 This picture was chosen over other alternatives because it aligns well with Persian cultural contexts and has been used in previous Persian studies.26 Since the study concerns the lexicosemantic and syntactic features, disfluencies and false starts were removed based on a protocol previously described.27
Language features
We used Stanza, an open-source Python natural language processing toolkit that supports 66 human languages, including English and Persian, to extract part-of-speech (POS) tags and dependency relationships.29 Figure 2 shows syntactic parsing for determining POS tags and dependency relations in English and Persian. POS tags and dependency relations were normalized by dividing their raw counts by the total number of words each participant produced. Since this method results in hundreds of features, some of which are rarely used, we included those features used by at least 10% of all participants. We have previously provided a list of definitions and examples for features extracted from Stanza.30 In addition to the Stanza-derived feature set, we included word length, sentence length, average log frequency of all words and content words, total number of sentences, total number of words, and syntax frequency.27,31
Statistical analysis
To determine the correlation between language features and AD status (0 or 1), we used point biserial correlation. For classification, we used a binary logistic regression model. We employed a leave-one-out cross-validation (LOOCV) approach on our dataset to validate the model's performance. In each iteration of the LOOCV, a single observation is set aside as the test data, and the remaining observations are used to train the model. To minimize the number of variables used in the logistic regression, we used recursive feature elimination (RFE).32 RFE is a feature selection technique used in machine learning, particularly with logistic regression. It is a backward method of selecting predictors that starts with all available features, trains a model, evaluates feature importance, removes the least important features iteratively, and repeats this process until a desired number of features or a stopping criterion is met. RFE reduces dataset dimensionality, enhances model interpretability, and potentially improves generalization by focusing on informative features.
Language Informativeness Index (LII)
LII measures the semantic similarity between a target text (produced by participants) and a reference text consisting of a meticulously crafted and highly informative description of the picture used for language production. We used a transformer-based model, the “bert-base-nli-mean-tokens,” from the Sentence Transformers library.33 This model is a specialized version of the BERT (Bidirectional Encoder Representations from Transformers) model, tailored for sentence-level embeddings and optimized for natural language processing tasks.34 The “bert-base” variant is a more compact and efficient version of the model, featuring 12 transformer blocks, 768 hidden units, and 12 attention heads. We defined a function that takes two input texts to measure their similarity. Using the BERT model, each text is first transformed into a high-dimensional vector representation (embedding). We then computed the cosine similarity between these embeddings, resulting in a similarity score ranging from 1 (identical in meaning) to −1 (completely dissimilar).
Reference text
Multiple approaches can be taken in generating the reference text, such as synthesizing a text using information units extracted from normative data35 or a predefined list of information units. Here, we used artificial intelligence-based image-to-text tools from OpenAI36 to ensure it contains all 23 standard information units.37 The list consists of 23 information units in four key categories: subjects, places, objects, and actions. The three subjects were the boy, the girl, and the woman. The two places were the kitchen and the exterior seen through the window. The 11 objects included cookie, jar, stool, sink, plate, dishcloth, water, window, cupboard, dishes, and curtains. Finally, the seven actions or facts were boy taking or stealing, boy or stool falling, woman drying or washing dishes/plate, water overflowing or spilling, action performed by the girl, woman unconcerned by the overflowing, and woman indifferent to the children. Below is the reference text in English.
“In the kitchen, a boy is standing on a stool. He is trying to steal cookies from a cookie jar on a cupboard shelf. The stool is tilted, so the boy is about to fall. A little girl is standing on the floor, reaching up for some cookies. A woman, likely their mother, is washing or drying a plate with a dishcloth. The water is overflowing from the sink. The woman seems unaware or indifferent to the spilling water and the children's actions. Outside the window, there is a little yard with a driveway. The window has curtains. There are some dishes on the counter.”
Results
The language features of AD in one language are generalizable to another language
We first determined the language features with the highest correlation with AD using point-biserial correlations for each language (Fig. 3). In English, the language features with the highest correlation with AD include shorter content words (r = −0.33, P < 0.001), more pronouns (r = 0.28, P < 0.001), more objects (r = 0.28, P < 0.001), more adverbs (r = 0.27, P < 0.001), fewer total words (r = −0.25, P = 0.002), more demonstratives (r = 0.24, P = 0.003), shorter sentences (r = −0.22, P = 0.005), and fewer prepositions (r = −0.22, P = 0.006).
[IMAGE OMITTED. SEE PDF]
To test cross-linguistic transferability, we selected all features with a significant correlation with AD in the English cohort at the alpha level of 0.05 to build a classifier based on a binary logistic regression in the Persian cohort. We did not apply a correction for multiple comparisons at this stage since the RFE method inherently addresses the issue of multiple comparisons. RFE operates iteratively, evaluating the contribution of each feature to the classification model's performance and eliminating the least significant features. By systematically ranking and removing features, RFE effectively prioritizes those with the greatest discriminatory power, reducing the potential for false discoveries. Therefore, the application of RFE minimizes the impact of multiple comparisons by selecting a parsimonious set of features that collectively optimize classification accuracy.
We applied LOOCV for cross-validation, which accounts for potential overfitting and provides a robust evaluation of our classifier's generalization ability. Using the indicators of AD in the English cohort, LOOCV resulted in an average accuracy of 90% in classifying AD in the Persian cohort (average positive predictive value = 92% and average sensitivity = 88%). The final feature set for AD classification after RFE included more demonstratives, shorter sentences, and fewer prepositions.
To test the bi-directionality of the linguistic transfer, we used language indicators of AD in Persian (Fig. 3B) to classify AD in English. The model resulted in an accuracy of 81% (with a positive predictive value of 85% and an average sensitivity of 86%). To ensure that the numerical difference in classification accuracy is not statistically significant, we included an interaction term between the language variable (English versus Persian) and the predictive linguistic features in the logistic regression model. This interaction was not statistically significant (all P > 0.05), indicating that the effect of predictive language features on AD status does not significantly differ between English and Persian speakers.
LII shows that the language features of AD are associated with reduced informativeness
First, we assessed the validity of LII by sequentially dropping individual information units from the reference text and evaluating the impact on LII. We manually eliminated each information unit from the text and compared its similarity to the original reference text. As expected, LII progressively declined as a function of sequentially eliminating information units, confirming the sensitivity of the index in gauging language informativeness (Fig. 4A).
[IMAGE OMITTED. SEE PDF]
In the English cohort, pwAD produced text with a lower LII (mean = 0.84, SD = 0.10) compared to healthy individuals (mean = 0.90, SD = 0.04) t(140.53) = 5.38, P < 0.001. Similarly, in the Persian cohort, pwAD produced texts with a lower LII (mean = 0.98, SD = 0.005) than healthy individuals (mean = 0.99, SD = 0.003) t(44.16) = 5.48, P < 0.001 (Fig. 4B).
We then investigated the relationship between the typical language abnormalities of AD and LII. Specifically, we sought to determine whether a high rate of the use of pronouns – which is a typical feature of AD – also correlates with low informativeness, suggesting the inability to produce informative messages as the origin of this language abnormality. We examined the amount of overlap between the pattern of correlations of language features with AD and the pattern of correlation of the same language features with low LII (language emptiness). As shown in Figure 5, we found a substantial overlap in the patterns of correlation of language features with AD and low informativeness. The ways in which the probability of AD changed with respect to language features are comparable with the ways in which low informativeness changes with respect to these language features. Where the AD graph shows an increase, the low informativeness graph follows in tandem, and this synchronized pattern persists in instances of decline. For instance, in English, the length of content words, rate of using adverbs, rate of using pronouns, and sentence length which had the highest correlation with AD (r = −0.33, P < 0.001, r = 0.27, P = 0.001, r = 0.27, P < 0.001, and r = −0.22, P = 0.005, respectively) showed comparable correlations with language emptiness (r = −0.36, P < 0.001, r = 0.29, P < 0.001 and r = 0.24, P = 0.002, and r = −0.22, P = 0.005, respectively).
[IMAGE OMITTED. SEE PDF]
Similarly, in the Persian cohort, sentence length, the rate of pronouns, and the rate of demonstratives which had the highest correlation with AD (r = −0.67, P < 0.001; r = 0.62, P < 0.001; and r = 0.59, P < 0.001, respectively) also showed comparable correlations with language emptiness (r = −0.44, P = 0.001; r = 0.55, P < 0.001; and r = 0.59, P < 0.001, respectively).
Of note, the rate of using objects, which showed poor transferability in predicting AD from English to Persian (Fig. 3), had a correlation of about zero with low informativeness (r = −0.01, P = 0.92). This finding suggests that features that are similar in English and Persian in predicting AD have a high correlation with emptiness and those with poor transferability have a minimal correlation with emptiness.
Discussion
In this work, we sought to elucidate the psycholinguistic foundations of language impairments in the connected speech of people with Alzheimer's disease. Although numerous studies have underscored the diagnostic utility of language analysis in identifying people with or who will develop AD, the mechanisms underlying these language abnormalities remain poorly understood. Language production is a complex process that begins with generating a thought-level message. Once the message is ready for expression, the formulation process unfolds, which entails accessing lexical elements, specifying grammatical relations, and mapping the resultant output onto inflectional and phrasal structures.11 It is currently unclear how the hallmark cognitive impairments of AD affect the process of language production.
To delineate this mechanism, we contrasted two possibilities. The first possibility is that the linguistic abnormalities of AD could be attributed to a deterioration in cognitive functions necessary to establish a given language's specific morphosyntactic rules. The second possibility is that the language anomalies might indicate a more fundamental and universal disruption in the language production process at the level of message formation, transcending surface structures specific to a given language. By comparing native speakers of English and Persian, which are languages with considerable structural differences, we demonstrated that the English linguistic indicators of AD could be used to construct a classification model for AD in Persian with an accuracy of 90%. The high degree of transferability of language indicators of AD from English to Persian suggests that these indicators likely do not stem from a breakdown in language-specific morphosyntactic rules. This conclusion is reinforced by observing shared linguistic indicators of AD across English and Persian, such as reduced use of adjectives despite their different syntactic ordering.
Second, we hypothesized that the primary deficit in AD-related language abnormalities is an inability to construct informative messages, which we tested using the Language Informativeness Index (LII), a novel metric for informativeness. We found robust correlations between typical language indicators of AD and language emptiness in both English and Persian. Crucially, there was substantial overlap in how informativeness and likelihood of AD changed with respect to language features. The few language features that did not show a comparable relationship with LII and AD, such as the rate of using objects, had poor transferability across languages in classifying AD. Our findings suggest that the inability to form a clear, informative message might be the origin of the typical language features of AD, such as increased use of pronouns, increased use of adverbs, shorter words, shorter sentences, increased use of demonstratives (e.g., “this” or “that”), high word frequency, and decreased usage of numerals.
Although language emptiness has been a widely acknowledged finding in AD literature,38–41 several critical research gaps hindered the emergence of a comprehensive understanding of the neurolinguistic underpinnings of language impairments in AD. A major limitation has been the absence of a rigorous method for quantifying language emptiness, which in turn hindered investigations to establish a clear relationship between diminished informativeness and the language indicators of AD. Consequently, the language abnormalities observed in AD have been viewed as separate findings rather than reflections of the same fundamental problem in generating informative messages. The informativeness of language, arguably its most fundamental component, is challenging to quantify. Effective measurement needs to consider both the richness and variability of words,42 as well as their relevance to the topic of speech. Previous approaches often focused on some but not all of these essential components. In this work, we introduced LII as a fully automated approach that provides a graded scoring of language informativeness. The metric is sensitive to the richness, variability, and relevance of words without being bound to a particular wording.
Another notable hurdle in understanding the mechanism of language abnormalities of AD is the scarcity of cross-linguistic approaches in the field. A predominant focus on a single language fails to distinguish between various alternative explanations, such as whether abnormalities stem from surface structures of the language or deeper layers of message formation. These ethnocentric approaches are limited in their scope to provide new insights43 and might even result in misconceptions about language phenomena (for example, see references44,45 for discussions about perspectives on nonfluent aphasia that may stem, at least in part, from an ethnocentric approach to language analysis).
When cross-linguistic similarities are observed in comparative linguistics studies, several potential explanations are considered.13 The explanations include pure chance, borrowing of linguistic elements due to language contact, close branching in a language family tree, or linguistic universals – features common across languages due to inherent properties of human language capacity. The high transferability rate of 90% from English to Persian in classifying AD makes pure chance an unlikely explanation. Additionally, by selecting two linguistically distant languages, we minimized the influence of cultural and typological similarities. Therefore, we propose that the observed commonalities in AD language indicators between English and Persian stem from a shared underlying feature: poor message formation. Language universals are thought to have biological foundations stemming from a genetically determined language faculty shared among speakers of all languages.12 Additionally, as speakers of different languages share a common cognitive apparatus, they are expected to possess a set of foundational properties that underlie language production and comprehension.46 Here, we suggest that message generation is one such biologically plausible language universal that is affected in AD, resulting in its typical language features across English and Persian (and potentially other languages).
In the correlational analyses, we observed modest effect sizes between language features and AD diagnosis in the English cohort, with relatively stronger effect sizes in the Persian cohort. The modest correlations may stem from the complexity of language production, which depends on various cognitive processes that may not be uniformly affected by AD, especially in its early stages. The existence of noise, the heterogeneity of AD, and potential compensatory mechanisms by individuals could also contribute to these modest correlations. The relatively stronger correlations in the Persian cohort may be attributed to the use of additional diagnostic biomarkers, such as brain MRI, to confirm AD more definitively.
In conclusion, our study offers a comprehensive understanding of the psycholinguistic underpinnings of language abnormalities in Alzheimer's disease. By employing a robust metric of language informativeness based on artificial intelligence and embracing a cross-linguistic approach, the work establishes links between the typical language abnormalities of AD and the inability to produce informative messages. This connection highlights a universal aspect of language production as the affected stage of language production in AD, transcending specific linguistic structures. Future research should explore the transferability of acoustic features, such as slowed speech rate and longer pauses, across different languages, as these typical characteristics in AD1 may also indicate reduced informativeness and thus be transferable cross-linguistically. Additionally, future studies should expand this approach to include a broader range of languages, particularly those from entirely different language families, to further understand the universality of language markers associated with Alzheimer's disease. This approach enhances our understanding of the psycholinguistic deficits in AD and could also lead to the development of diagnostic tools across various languages, ultimately promoting health equity and fostering a deeper appreciation of biocultural diversity.
Acknowledgments
We thank Harvard Catalyst, the Harvard Clinical and Translational Science Center (National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health Award UL1 TR002541) for its biostatistician consultation service. We also thank Mahan Rezaee for transcriptional services in Persian.
Author Contributions
SB, MS, MMP, AK, MG, SR, SB, ZM, and MST participated in the data collection. SB, MS, MMP, AK, MG, SR, and MAD contributed to patients' clinical and neuropsychological assessment. NR conceptualized the idea, analyzed the data, and was a major contributor to writing the manuscript. NR and BCD edited the manuscript. All authors contributed to the initial writing and read and approved the final manuscript.
Conflict of Interest
The authors declare no competing interests to declare.
Consent to Participate
This study particularly champions biocultural diversity through the inclusion of participants from two linguistically and culturally distinct cohorts, thereby advancing the principle of avoiding ethnocentric biases in human data collection.
Data Availability Statement
The English dataset was obtained from the DementiaBank repository. The Persian dataset obtained for this study, in addition to the computational codes in Python, can be accessed upon contacting the senior author at
de la Fuente Garcia S, Ritchie CW, Luz S. Artificial intelligence, speech, and language processing approaches to monitoring Alzheimer's disease: a systematic review. J Alzheimers Dis. 2020;78(4):1547‐1574. doi:
Fraser KC, Meltzer JA, Rudzicz F. Linguistic features identify Alzheimer's disease in narrative speech. J Alzheimers Dis. 2016;49(2):407‐422. doi:
Bittner D, Frankenberg C, Schröder J. Changes in pronoun use a decade before clinical diagnosis of Alzheimer's dementia—linguistic contexts suggest problems in perspective‐taking. Brain Sci. 2022;12(1): [eLocator: 121]. doi:
Szatloczki G, Hoffmann I, Vincze V, Kalman J, Pakaski M. Speaking in Alzheimer's disease, is that an early sign? Importance of changes in language abilities in Alzheimer's disease. Front Aging Neurosci. 2015;7: [eLocator: 195]. doi:
Eyigoz E, Mathur S, Santamaria M, Cecchi G, Naylor M. Linguistic markers predict onset of Alzheimer's disease. EClinicalMedicine. 2020;28: [eLocator: 100583]. doi:
Ahmed S, de Jager CA, Haigh AM, Garrard P. Semantic processing in connected speech at a uniformly early stage of autopsy‐confirmed Alzheimer's disease. Neuropsychology. 2013;27(1):79‐85. doi:
Kothari M, Shah DV, Moulya T, Rao SP, Jayashree R. Measures of lexical diversity and detection of Alzheimer's using speech. Proceedings of the 15th International Conference on Agents and Artificial Intelligence. SCITEPRESS – Science and Technology Publications; 2023:806‐812. doi:
Mueller KD, Koscik RL, Hermann BP, Johnson SC, Turkstra LS. Declines in connected language are associated with very early mild cognitive impairment: results from the Wisconsin registry for Alzheimer's prevention. Front Aging Neurosci. 2018;9: [eLocator: 323117]. doi:
Williams E, Theys C, McAuliffe M. Lexical‐semantic properties of verbs and nouns used in conversation by people with Alzheimer's disease. PLoS One. 2023;18(8): [eLocator: e0288556]. doi:
Říha L. Selection of the most suitable linguistic features for diagnosis of Alzheimer's disease from changes in spoken language production. 2022. Accessed December 3, 2023. https://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva‐53472
Levelt WJM. Speaking: from Intention to Articulation. A Bradford Book; 1989.
Chomsky N. Aspects of the Theory of Syntax. M.I.T. Press; 1969.
Fortson BW IV. Indo‐European Language and Culture: An Introduction.
Soheili‐Isfahani A. Noun Phrase Complementation in Persian. University of Illinois at Urbana‐Champaign; 1976. Accessed January 14, 2024. https://hdl.handle.net/2142/66696
Hajati AK. Ke‐Constructions in Persian: Descriptive and Theoretical Aspects. University of Illinois at Urbana‐Champaign; 1977. Accessed January 14, 2024. https://hdl.handle.net/2142/66700
Dabir‐Moghaddam M. Syntax and Semantics of Causative Constructions in Persian. University of Illinois at Urbana‐Champaign; 1982.
Sedighi A. Persian as a heritage language. In: Sedighi A, Shabani‐Jadidi P, eds. The Oxford Handbook of Persian Linguistics. Oxford University Press; 2018. doi:
Goldberg AE. Words by default: the Persian complex predicate construction. Mismatch Form‐Funct Incongruity Archit Gramm. 2003;1:17‐146.
Goldberg AE. Words by default: optimizing constraints and the Persian complex predicate. In: Annual Meeting of the Berkeley Linguistics Society. Vol 22. Linguistic Society of America; 1996:132–146. Accessed January 14, 2024. http://journals.linguisticsociety.org/proceedings/index.php/BLS/article/download/1360/1144
Becker JT, Boller F, Lopez OL, Saxton J, McGonigle KL. The natural history of Alzheimer's disease. Description of study cohort and accuracy of diagnosis. Arch Neurol. 1994;51(6):585‐594. doi:
American Psychiatric Association. Diagnostic and Statistical Manual on Mental Disorders.
Pouretemad HR, Khatibi A, Ganjavi A, Shams J, Zarei M. Validation of Addenbrooke's cognitive examination (ACE) in a Persian‐speaking population. Dement Geriatr Cogn Disord. 2009;28(4):343‐347. doi:
Ansari NN, Naghdi S, Hasson S, Valizadeh L, Jalaie S. Validation of a Mini‐Mental State Examination (MMSE) for the Persian population: a pilot study. Appl Neuropsychol. 2010;17(3):190‐195. doi:
McKhann G, Drachman D, Folstein M, Katzman R, Price D, Stadlan EM. Clinical diagnosis of Alzheimer's disease. Neurology. 1984;34(7): [eLocator: 939]. doi:
Goodglass H, Kaplan E. Boston Diagnostic Aphasia Examination (BDAE). Lea and Febiger. Distributed by Psychological Assessment Resources; 1983.
Nilipour R. Agrammatic language: two cases from Persian. Aphasiology. 2000;14(12):1205‐1242. doi:
Rezaii N, Mahowald K, Ryskin R, Dickerson B, Gibson E. A syntax‐lexicon trade‐off in language production. Proc Natl Acad Sci USA. 2022;119(25): [eLocator: e2120203119]. doi:
MacWhinney B. The Talkbank project. In: Beal JC, Corrigan KP, Moisl HL, eds. Creating and Digitizing Language Corpora: Volume 1: Synchronic Databases. Palgrave Macmillan UK; 2007:163‐180. doi:
Qi P, Zhang Y, Zhang Y, Bolton J, Manning CD. Stanza: a python natural language processing toolkit for many human languages. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics; 2020:101‐108. doi:
Rezaii N, Hochberg D, Quimby M, et al. Artificial intelligence classifies primary progressive aphasia from connected speech. Brain. 2024;147(9):3070‐3082. doi:
Rezaii N, Mahowald K, Ryskin R, Dickerson B, Gibson E. Syntactic Rule Frequency as a Measure of Syntactic Complexity: Insights from Primary Progressive Aphasia. 34th Annual CUNY Conference on Human Sentence Processing. 34th Annual CUNY Conference on Human Sentence Processing. Accessed June 14, 2022. https://www.cuny2021.io/2021/02/24/251/
Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46(1):389‐422. doi:
Sentence‐transformers (Sentence Transformers). 2023. Accessed January 15, 2024. https://huggingface.co/sentence‐transformers
Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre‐training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics; 2019:4171‐4186. doi:
Pekkala S, Wiener D, Himali JJJ, et al. Lexical retrieval in discourse: an early indicator of Alzheimer's dementia. Clin Linguist Phon. 2013;27(12):905‐921. doi:
OpenAI. GPT‐4 model. 2023 https://www.openai.com/
Croisile B, Ska B, Brabant MJ, et al. Comparative study of oral and written picture description in patients with Alzheimer's disease. Brain Lang. 1996;53(1):1‐19. doi:
Nicholas M, Obler LK, Albert ML, Helm‐Estabrooks N. Empty speech in Alzheimer's disease and fluent aphasia. J Speech Lang Hear Res. 1985;28(3):405‐410. doi:
Obler L, Albert M. Language and aging: a neurobehavioral analysis. In: Beasley D, Davis GA, eds. Aging: Communication Process and Disorders. Grune and Stratton; 1981;107‐121.
Snowdon DA, Kemper SJ, Mortimer JA, Greiner LH, Wekstein DR, Markesbery WR. Linguistic ability in early life and cognitive function and Alzheimer's disease in late life: findings from the nun study. JAMA. 1996;275(7):528‐532. doi:
Ripich DN, Terrell BY. Patterns of discourse cohesion and coherence in Alzheimer's disease. J Speech Hear Disord. 1988;53(1):8‐15. doi:
Rezaii N, Ren B, Quimby M, Hochberg D, Dickerson BC. Less is more in language production: an information‐theoretic analysis of agrammatism in primary progressive aphasia. Brain Commun. 2023;5: [eLocator: fcad136]. doi:
Fraser KC, Linz N, Li B, et al. Multilingual prediction of Alzheimer's disease through domain adaptation and concept‐based language modelling. In: Burstein J, Doran C, Solorio T, eds. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics; 2019:3659‐3670. doi:
Rezaii N, Dickerson BC. Artificial intelligence enables a paradigm shift in understanding nonfluent aphasia. 2024. doi:
Rezaii N, Michaelov J, Josephy‐Hernandez S, et al. Measuring sentence information via surprisal: theoretical and clinical implications in nonfluent aphasia. Ann Neurol. 2023;94(4):647‐657. doi:
Moore TE, ed. Cognitive Development and the Acquisition of Language. Academic; 1973.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Objective
This study aims to elucidate the cognitive underpinnings of language abnormalities in Alzheimer's Disease (AD) using a computational cross‐linguistic approach and ultimately enhance the understanding and diagnostic accuracy of the disease.
Methods
Computational analyses were conducted on language samples of 156 English and 50 Persian speakers, comprising both AD patients and healthy controls, to extract language indicators of AD. Furthermore, we introduced a machine learning‐based metric, Language Informativeness Index (LII), to quantify empty speech.
Results
Despite considerable disparities in surface structures between the two languages, we observed consistency across language indicators of AD in both English and Persian. Notably, indicators of AD in English resulted in a classification accuracy of 90% in classifying AD in Persian. The substantial degree of transferability suggests that the language abnormalities of AD do not tightly link to the surface structures specific to English. Subsequently, we posited that these abnormalities stem from impairments in a more universal aspect of language production: the ability to generate informative messages independent of the language spoken. Consistent with this hypothesis, we found significant correlations between language indicators of AD and empty speech in both English and Persian.
Interpretation
The findings of this study suggest that language impairments in AD arise from a deficit in a universal aspect of message formation rather than from the breakdown of language‐specific morphosyntactic structures. Beyond enhancing our understanding of the psycholinguistic deficits of AD, our approach fosters the development of diagnostic tools across various languages, enhancing health equity and biocultural diversity.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details

1 Azad University Science and Research Branch, Tehran, Iran
2 Abrar Institute of Higher Education, Tehran, Iran
3 Institute for Cognitive Science Studies, Tehran, Iran
4 Mashhad University of Medical Science, Mashhad, Iran
5 Shahid Beheshti University of Medical Sciences, Tehran, Iran
6 Iran University of Medical Sciences, Tehran, Iran
7 Massachusetts General Hospital, Harvard Medical School, Boston, USA, Athinoula A. Martinos Center for Biomedical Imaging, Boston, Massachusetts, USA, Massachusetts Alzheimer's Disease Research Center, Boston, Massachusetts, USA
8 Massachusetts General Hospital, Harvard Medical School, Boston, USA, Athinoula A. Martinos Center for Biomedical Imaging, Boston, Massachusetts, USA