Speech and language characteristics differentiate

Full text

Turn on search term navigation

INTRODUCTION

Alzheimer's disease (AD) and dementia with Lewy bodies (DLB) are the two most common types of late-onset neurodegenerative dementias.^1,2 Early and accurate differentiation between AD and DLB is important to ensure appropriate management and treatment of the disease,^3,4 but similarities in clinical manifestations often result in difficulties in clinical diagnosis.^2,3 Although diagnostic biomarkers in cerebrospinal fluid and neuroimaging are the most well-validated biomarkers,^3,5,6 they can be invasive, time-consuming, and expensive. Hence, novel approaches for screening candidates who should be examined with the biomarkers would help the differential diagnosis of AD and DLB.

Speech may be a promising data source for developing screening tools and obtaining multifaceted information encompassing prosodic, acoustic, and linguistic characteristics.^7–18 In fact, discernable differences in speech features characterizing these aspects have been reported in patients with different dementia types.^8–11 For instance, speech and language disturbances have been observed in the early stages of AD^8,12,13 and may enable the prediction of its onset.^14,18 In particular, numerous studies have investigated spontaneous, connected speech during picture description tasks^14–16 and reported linguistic differences in AD patients such as increased repetition and reduced informative content related to word-finding difficulties.^7,8,15,16 In addition, speech and language impairments do not constitute a central feature for DLB,¹⁹ but they are also observed in the course of DLB.^9,17,20,21 DLB patients showing discernable differences in language characteristics encompassing all linguistic aspects have been suggested, with the exception of the lexico-semantic domain, although these deficits have been suggested to be less severe compared with AD patients by study with a communication test evaluated by human experts.²² In the speech of DLB patients, prosodic differences, such as slower speech rate and longer pauses, have been consistently observed,^9,20,17,21 and these differences are thought to be at least partially related to motor impairment.¹⁷ Furthermore, although there is no study for DLB, acoustic impairments, such as increased variabilities in voice frequency and amplitude (i.e., jitter and shimmer), have been known to be significant features of Parkinson's disease (another form of Lewy body spectrum disorders).^23–26 By contrast, prosodic and acoustic differences in AD patients such as pauses and variability in voice frequency have yielded mixed results with several studies showing no significant difference and others showing statistically significant differences.^27–29 Together, these previous studies suggest different profiles of speech and language impairments between AD and DLB,^17,22,30 but there is no study directly comparing them using speech analysis.

In this study, we aimed to identify differences in features characterizing speech and language impairments between AD and DLB and to examine the feasibility of using these features to identify and differentiate AD and DLB. On the basis of previous studies, we hypothesized that AD and DLB patients would have different profiles of speech and language impairments. Specifically, we hypothesized that patients with AD would show larger differences in linguistic features compared with DLB patients, and patients with DLB would show larger differences in prosodic and acoustic features than AD patients. A second hypothesis was that these features would be used for reliably classifying patients with AD, those with DLB, and cognitively normal (CN) individuals. We collected speech responses with a tablet-based application from participants consisting of three clinical diagnostic groups: AD, DLB, and CN. We then extracted speech features characterizing acoustic, prosodic, and linguistic aspects. Finally, we tested the first hypothesis by statistically comparing the speech features between the diagnostic groups, and we tested the second hypothesis by assessing the performance of machine-learning models using these features to identify and differentiate AD and DLB.

RESEARCH IN CONTEXT

Systematic Review: We reviewed the literature using medical and academic databases (e.g., PubMed and Google Scholar) and cited relevant review articles. Speech analysis has succeeded in quantifying speech and language impairments in various neurodegenerative diseases including Alzheimer's disease (AD) and dementia with Lewy bodies (DLB). However, whether these features can differentiate AD and DLB has not been directly investigated.
Interpretation: Our results provide initial evidence of (1) discriminative differences in linguistic, prosodic, and acoustic features that would reflect cognitive and motor impairments in AD and DLB, and (2) the feasibility of machine-learning models by using these features to identify and differentiate AD and DLB as an easy-to-perform screening tool.
Future Directions: Future studies may aim to confirm our findings with larger samples and neuropathological biomarkers and investigate the associations of these speech features with other types of dementia as well as Lewy body disorders.

HIGHLIGHTS

AD and DLB showed different profiles of speech and language impairments
AD showed more severe impairments in linguistic aspects compared with DLB
DLB showed more severe impairments in prosodic/acoustic aspects compared with AD
Combining these speech features successfully identified/differentiated AD and DLB

METHODS Participants

We recruited outpatients from the Department of Psychiatry, University of Tsukuba Hospital, along with the spouses of the patients, and other participants either through local recruiting agencies or community advertisements in Ibaraki, Japan. The patients met the standard research diagnostic criteria for mild cognitive impairment (MCI)/dementia due to AD or Lewy body disease. Specifically, the patients in the AD group fulfilled the National Institute on Aging and Alzheimer's Association core clinical criteria for probable AD dementia⁵ or MCI,³¹ as well as the AD Neuroimaging Initiative criteria for AD or MCI.³² The patients in the DLB group fulfilled McKeith et al.’s clinical diagnostic criteria for probable/possible DLB³ or MCI with Lewy bodies.³³ Therefore, our samples were clinically diagnosed, though their diagnoses were not confirmed by biomarker or postmortem examination. The CN participants were age-matched to the patients and did not meet any of the aforementioned criteria. Participants were excluded if they had diagnoses of other types of dementia (e.g., frontotemporal dementia or vascular dementia) or other serious diseases or disabilities that would interfere with the collection of speech data. Thus, the participants formed three clinical diagnostic groups of cognitive impairment due to AD (AD group) or Lewy body disease (DLB group), and cognitively normal controls (CN group). Patients in the AD and DLB groups ranged from MCI to moderate dementia.³⁴ The participants were administered cognitive and clinical examinations, which comprised 12 variables (see Table 1 for a full list and the Supplementary Methods in the Supporting Information for imaging details). Three psychiatrists (authors TA, KN, and MO), who are experts in dementia and were blind to the results of the speech data analysis, examined each case in terms of the clinical record, as well as the cognitive and clinical measures, and they confirmed the diagnoses.

TABLE 1 Participant demographics and cognitive/clinical measures

	CN (n = 49)			AD (n = 45)			DLB (n = 27)			BF	P value
Age, years	72.3	(3.8)		73.1	(6.7)		75.1	(5.0)		0.519	0.103
Sex, female, n (%)	31	(63.3)		20	(44.4)		12	(44.4)		0.517	0.126
Education, years	13.1	(2.0)		13.1	(2.7)		12.7	(2.8)		0.104	0.735
MCI, n (%)	N/A			25	(55.6)		19	(70.4)		0.608	0.318
Antipsychotic medication, n (%)	0	(0.0)	^D	3	(6.8)	^b	6	(23.1)	^C, ^b	5.12	0.002
Mini-Mental State Examination^a	28.0	(1.6)	^A	23.5	(4.3)	^D,C	26.5	(3.6)	^A	1.82 × 10⁶	<0.001
Frontal Assessment Battery^a	13.6	(2.5)	^A,D	11.0	(3.7)	^C	11.1	(4.1)	^C	79.1	<0.001
Logical Memory-immediate^a	11.1	(3.3)	^A,D	4.4	(3.5)	^D,C	7.9	(3.9)	^A,C	7.85 × 10¹¹	<0.001
Logical Memory-delayed^a	9.2	(3.0)	^A,D	2.2	(2.6)	^D,C	5.9	(4.1)	^A,C	2.46 × 10¹⁵	<0.001
Trail Making Test part-A^a	36.0	(11.7)	^D	53.4	(46.6)		68.5	(54.0)	^C	13.69	0.002
Trail Making Test part-B^a	91.6	(39.9)	^A,D	168.1	(86.4)	^C	183.7	(86.8)	^C	2.56 × 10⁵	<0.001
Clock Drawing Test^a	6.7	(0.8)		6.1	(1.8)		6.6	(1.1)		0.872	0.063
Clinical Dementia Rating	0	(0.0)	^A,D	0.7	(0.3)	^C	0.6	(0.4)	^C	9.21 × 10¹⁹	<0.001
Geriatric Depression Scale^a	3.2	(3.0)		3.6	(3.1)		4.1	(3.9)		0.147	0.497
Activities of Daily Living^a	99.7	(1.2)	^A	98.7	(4.2)	^D,C	97.2	(5.9)	^A	1.47	0.031
Instrumental Activities of Daily Living^a	7.8	(0.6)	^D	6.7	(1.6)		6.1	(2.1)	^C	6.65 × 10³	<0.001
Medial temporal lobe atrophy	0.8	(0.5)	^A	1.6	(1.0)	^D,C	1.1	(0.6)	^A	2.52 × 10³	<0.001

Data are displayed as means (standard deviations), except for categorical data, which are displayed as numbers (percentages). Bold values highlight statistically significant differences (chi-square test, P < 0.05, for categorical data; one-way ANOVA, P < 0.05, for continuous data). Significant differences between individual diagnostic groups (chi-square test, P < 0.05, for categorical data; Tukey-Kramer test, P < 0.05, for continuous data) are marked with A, D, or C (A: different from AD; D: different from DLB; C: different from CN). Logical Memory-immediate and Logical Memory-delayed refer to immediate and delayed recall of Logical Memory Story A from the Wechsler Memory Scale-Revised, respectively.

The total score ranges are as follows: Mini-Mental State Examination, 0 to 30; Frontal Assessment Battery, 0 to 18; Logical Memory (immediate and delayed), 0 to 25; Trail Making Test (parts A and B), 0 to 300; Clock Drawing Test, 0 to 7; Geriatric Depression Scale, 0 to 15; Activities of Daily Living, 0 to 100; Instrumental Activities of Daily Living, 0 to 8.

Data missing for one participant.

Abbreviations: CN, cognitively normal; AD, Alzheimer's disease; DLB, dementia with Lewy bodies; ANOVA, analysis of variance; BF, Bayes factor; MCI, mild cognitive impairment.

The study was conducted with the approval of the Ethics Committee, University of Tsukuba Hospital (H29-065), and it followed the ethical code for research with humans as stated in the Declaration of Helsinki. All participants provided written informed consent to participate in the study. All examinations were conducted in Japanese. Experiment periods are described in the Supplementary Methods.

Speech data collection and speech features

The participants sat in front of an iPad Air 2 tablet and answered questions presented by a voice-based application on the tablet in a quiet room with low reverberation in a lab setting (for more details, see the Supplementary Methods). The participants performed five speech tasks: counting backwards, subtraction, tasks for phonemic and semantic verbal fluency, and picture description with the Cookie Theft picture from the Boston Diagnostic Aphasia Examination.³⁵ The reasons for selecting these five tasks are described in the Supplementary Materials. Full descriptions of the five speech tasks are provided in Table S1.

From the participants' speech responses to the five tasks, we extracted a total of 42 speech features for each participant, which consisted of 11 prosodic features, 22 acoustic features, and 9 linguistic features on the basis of previous studies on AD, DLB, and Parkinson's disease.^{8,9,16,24,28,36} They were extracted from each task response and investigated separately in subsequent analyses. A full list of speech features and tasks from which these features were extracted is given in Table S2. The acoustic and prosodic features were extracted from the speech responses to all five tasks except where otherwise indicated. Specifically, prosodic features included a proportion of the pause duration and pitch variation (i.e., inflection) in addition to the phoneme rate in the picture description task. Acoustic features included jitter and shimmer, which measure cycle-to-cycle variations of the fundamental frequency and amplitude, respectively. Increased shimmer and jitter have been reported in patients of neurodegenerative diseases.^23–26,37 Furthermore, we used the variances of first-order derivatives of the first 12 Mel-frequency cepstral coefficients (MFCCs) during the picture description task as acoustic features for building classifiers. We excluded the MFCC features from statistical comparisons between the three diagnostic groups due to the difficulty in interpreting their differences. Linguistic features were extracted from manually transcribed text data. In addition to the number of correct answers during the verbal fluency tasks, the following linguistic features were extracted from response data during the picture description task: the number of filler words, type-token ratio of nouns for measuring vocabulary richness,^16,36 and five features for measuring informative content.^7,8,15,16 A higher value of the type-token ratio indicates greater vocabulary richness. As for the five features for measuring informative content, we counted the number of unique entities that a participant described in the picture, referred to as information units, for four predefined categories (people, places, objects, and actions), and used the number of information units for each category and the total number of information units as linguistic features after dividing by the speech duration. Details of the calculation methods including preprocessing are described in the Supplementary Methods. In sum, we used all 42 speech features for building classifiers and 30 features (all speech features except for the 12 MFCC-based features) for statistical comparisons.

Statistical analysis

Group differences between CN, AD, and DLB were examined by using the chi-square test followed by a post hoc chi-square test for categorical data and one-way analysis of variance (ANOVA) followed by a post hoc Tukey-Kramer test for continuous data. Sex, disease stage (MCI, dementia), and the use of antipsychotic medication are categorical data, and the other data including all speech features are continuous data. For the multiple testing of the 30 speech features, the Benjamini-Hochberg correction was applied. We calculated a Bayes factor to assess the magnitude of evidence in favor of an alternative hypothesis versus the null hypothesis.³⁸ We also calculated the generalized eta-squared (η²) to assess the effect size of each feature, for which the values 0.01, 0.06, and 0.14 are considered to indicate small, medium, and large effects, respectively.³⁹ Between-group comparisons of speech features after controlling the use of antipsychotic medication were conducted with one-way analyses of covariance (ANCOVAs). A two-way ANOVA was used to examine the effects of the dementia type (AD, DLB) and disease stage (MCI, dementia) on the speech features. All statistical analyses were performed using R (version 4.0.5) with an alpha value of 0.05 (P < 0.05, two-sided).

Machine-learning analysis

To evaluate the feasibility of using speech features to identify and differentiate AD and DLB, we used supervised machine-learning models to classify the clinical diagnostic groups of AD, DLB, and CN via the speech features. The input variables for the model were the 42 speech features. Binary-classification models were evaluated by using accuracy, sensitivity, specificity, F1 score, and the area under receiver operating characteristic curve (AUC) obtained from 10 iterations of a leave-two-subjects-out cross-validation procedure. To reduce overfitting through automatic feature selection, we also performed a sequential forward selection algorithm. For the classification algorithm, we used a support vector machine with a radial basis function kernel⁴⁰ implemented with the Python package scikit-learn (version 0.23.2). The hyperparameters in this study have been found in a prior study.⁴¹ For missing values, we applied multivariate imputation by chained equations (version 3.13.3).⁴² Spearman's rank correlation coefficient was used to assess the correlations between cognitive scores and output probability measures of the binary-classification models.

RESULTS

A total of 123 participants met the inclusion criteria. Among the 123 participants, two were excluded from this analysis because they did not complete any of the five speech tasks. This yielded a total of 121 participants comprising three diagnostic groups of 45 AD, 27 DLB, and 49 CN participants (Table 1; S3 for additional clinical information on the DLB group; Supplementary Results for a power analysis of the sample size). The AD and DLB groups included 25 and 19 MCI patients, respectively, and their proportions were not statistically significantly different (P = 0.318). Regarding the demographics, neither the age, proportion of female participants, nor years of education showed any statistically significant differences among the groups (P > 0.05). The proportion of participants on antipsychotic medication was higher for the DLB group than for the CN group (P = 0.002 among the three groups; P = 0.003 for DLB vs. CN). All 12 cognitive and clinical measures except for the Clock Drawing Test and Geriatric Depression Scale were different among the diagnostic groups (all P < 0.05; Table 1). Detailed information about missing values for the speech data are reported in the Supplementary Results.

We investigated whether the speech features had statistically discernable differences between the clinical diagnostic groups of AD, DLB, and CN. We found that nine speech features showed statistically significant differences between the groups (Benjamini-Hochberg adjusted P < 0.05; Table 2). These nine features all showed a medium effect size (η² > 0.06).³⁹

TABLE 2 Speech features with statistically significant differences between the clinical diagnostic groups

										P value
Feature name	CN (n = 49)			AD (n = 45)			DLB (n = 27)			η²	UNADJ	ADJ
Linguistic
Number of correct answers (words) (Semantic VFT)	17.7	(5.3)	^A,D	13.8	(5.1)	^C	14.6	(5.4)	^C	0.108	0.001	0.018
Type-token ratio (Picture description)	0.81	(0.11)	^A	0.74	(0.14)	^C	0.78	(0.15)		0.054	0.039	0.099
Total number of information units (/s) (Picture description)	0.243	(0.110)	^A	0.190	(0.079)	^C	0.201	(0.098)		0.062	0.024	0.071
Number of information units in the action category (/s) (Picture description)	0.082	(0.049)	^A	0.057	(0.026)	^C	0.064	(0.038)		0.077	0.009	0.047
Number of information units in the place category (/s) (Picture description)	0.015	(0.014)	^A	0.009	(0.009)	^C	0.014	(0.011)		0.052	0.043	0.099
Number of information units in the object category (/s) (Picture description)	0.106	(0.049)	^A	0.082	(0.044)	^C	0.089	(0.044)		0.051	0.046	0.099
Prosodic
Proportion of pause duration (Picture description)	0.386	(0.129)	^D	0.446	(0.143)		0.507	(0.130)	^C	0.110	0.001	0.018
Proportion of pause duration (Semantic VFT)	0.752	(0.074)	^A,D	0.792	(0.079)	^C	0.802	(0.069)	^C	0.078	0.008	0.047
Proportion of pause duration (Subtraction)	0.339	(0.133)	^D	0.396	(0.155)		0.433	(0.177)	^C	0.059	0.032	0.088
Pitch variation (Subtraction)	25.5	(9.5)	^D	22.5	(9.7)		18.8	(8.2)	^C	0.072	0.014	0.047
Phoneme rate (/s) (Picture description)	2.37	(0.57)	^D	2.11	(0.63)		1.92	(0.59)	^C	0.085	0.006	0.047
Acoustic
Jitter (Semantic VFT)	0.067	(0.010)	^D	0.068	(0.009)	^D	0.074	(0.011)	^A,C	0.072	0.012	0.047
Jitter (Picture description)	0.066	(0.010)	^D	0.068	(0.011)		0.074	(0.011)	^C	0.070	0.014	0.047
Shimmer (Picture description)	0.117	(0.017)	^D	0.120	(0.019)	^D	0.130	(0.019)	^A,C	0.075	0.011	0.047

Data are displayed as means (standard deviations). Bold values highlight statistically significant differences (one-way ANOVA, P < 0.05). Significant differences between individual diagnostic groups (Tukey-Kramer test, P < 0.05) are marked with A, D, or C (A: different from AD; D: different from DLB; C: different from CN).

Abbreviations: CN, cognitively normal; AD, Alzheimer's disease; DLB, dementia with Lewy bodies; UNADJ, unadjusted; ADJ, Benjamini-Hochberg adjusted; VFT, verbal fluency task.

Post hoc pairwise comparisons with the Tukey-Kramer test revealed the following patterns of statistically significant differences between the CN groups and the AD or DLB groups (Table 2 and Figure 1). Regarding the overall trends, the AD group demonstrated larger differences from the CN group in linguistic features, while the DLB group demonstrated larger differences in acoustic and prosodic features. Specifically, the speech of the AD group showed significant differences in the linguistic features of reduced information units in the action category during the picture description task (P = 0.047) and reduced correct answers of the semantic verbal fluency task (P = 0.018). By contrast, the speech of the DLB group showed significant differences in the prosodic features of a slower phoneme rate (P = 0.047), less pitch variation in the subtraction task (P = 0.047), and increased proportion of pause duration in the semantic verbal fluency and picture description tasks (P = 0.047 and P = 0.018, respectively); DLB speech was also significantly different in the acoustic features of increased jitter in the semantic verbal fluency and picture description tasks (P = 0.047 and P = 0.047, respectively) and increased shimmer in the picture description task (P = 0.047). After controlling for the use of antipsychotic medication, these speech features showed consistent trends: Seven of the nine speech features remained significantly different between the groups (Benjamini-Hochberg adjusted P < 0.05; Table S4), and the AD group showed significant differences from the CN group in linguistic features while the DLB group showed significant differences in prosodic and acoustic features (post hoc pairwise comparisons with the Tukey-Kramer test P < 0.05; Table S4).

View Image - FIGURE 1. Differences in the linguistic, acoustic, and prosodic features between three clinical diagnostic groups: Alzheimer's disease (AD) patients, dementia with Lewy bodies (DLB) patients, and cognitively normal (CN) individuals. (A) Graphs of linguistic (upper left), acoustic (upper right), and prosodic (lower left and right) features. Boxes indicate the 25th (Q1) and 75th (Q3) percentiles. Whiskers indicate the upper and lower adjacent values that are most extreme within Q3+1.5 (Q3–Q1) and Q1–1.5 (Q3–Q1), respectively. The line and diamond in each box represent the median and mean, respectively. Dots outside of the box represent outliers. Horizontal bars indicate significant differences (Tukey-Kramer test: *P [less than] 0.05, **P [less than] 0.01, ***P [less than] 0.001). (B) Radar chart comparing the linguistic, acoustic, and prosodic features of the AD, DLB, and CN groups, scaled according to Z-scores derived from the CN group's means and standard deviations.

FIGURE 1. Differences in the linguistic, acoustic, and prosodic features between three clinical diagnostic groups: Alzheimer's disease (AD) patients, dementia with Lewy bodies (DLB) patients, and cognitively normal (CN) individuals. (A) Graphs of linguistic (upper left), acoustic (upper right), and prosodic (lower left and right) features. Boxes indicate the 25th (Q1) and 75th (Q3) percentiles. Whiskers indicate the upper and lower adjacent values that are most extreme within Q3+1.5 (Q3–Q1) and Q1–1.5 (Q3–Q1), respectively. The line and diamond in each box represent the median and mean, respectively. Dots outside of the box represent outliers. Horizontal bars indicate significant differences (Tukey-Kramer test: *P [less than] 0.05, **P [less than] 0.01, ***P [less than] 0.001). (B) Radar chart comparing the linguistic, acoustic, and prosodic features of the AD, DLB, and CN groups, scaled according to Z-scores derived from the CN group's means and standard deviations.

The ANOVA post hoc Tukey-Kramer tests also revealed that the AD and DLB groups had significant differences in two acoustic features: jitter in the semantic verbal fluency (P = 0.026) and shimmer in the picture description tasks (P = 0.045) (Table 2). We further conducted a two-way ANOVA, 2 dementia types (AD, DLB) × 2 disease stages (MCI, dementia), for these two acoustic features. The results showed significant effects of the dementia type, in which both jitter and shimmer were higher in the DLB group compared with the AD group (Figure 2): A two-way ANOVA of the jitter revealed a significant effect of the dementia type (P = 0.014), no significant effect of the disease stage, and no interaction; that of the shimmer revealed significant effects of the dementia type (P = 0.014) and disease stage (P = 0.030), and no interaction.

View Image - FIGURE 2. Differences in acoustic features (upper: jitter; lower: shimmer) based on clinical diagnosis (cognitively normal [CN], Alzheimer's disease [AD], dementia with Lewy bodies [DLB]) and disease stages (mild cognitive impairment [MCI], dementia). Boxes indicate the 25th (Q1) and 75th (Q3) percentiles. Whiskers indicate the upper and lower adjacent values that are most extreme within Q3+1.5 (Q3–Q1) and Q1–1.5 (Q3–Q1), respectively. The line and diamond in each box represent the median and mean, respectively. Dots outside of the box represent outliers.

FIGURE 2. Differences in acoustic features (upper: jitter; lower: shimmer) based on clinical diagnosis (cognitively normal [CN], Alzheimer's disease [AD], dementia with Lewy bodies [DLB]) and disease stages (mild cognitive impairment [MCI], dementia). Boxes indicate the 25th (Q1) and 75th (Q3) percentiles. Whiskers indicate the upper and lower adjacent values that are most extreme within Q3+1.5 (Q3–Q1) and Q1–1.5 (Q3–Q1), respectively. The line and diamond in each box represent the median and mean, respectively. Dots outside of the box represent outliers.

We evaluated the model performance by using iterative cross-validation for classifying the three clinical diagnostic groups of AD, DLB, and CN on the basis of speech features (Table 3). The binary-classification models achieved 87.0% accuracy (86.4% sensitivity, 87.6% specificity, 86.4% F1 score, AUC of 0.851) for AD vs. CN; 93.2% accuracy (88.9% sensitivity, 95.5% specificity, 90.2% F1 score, AUC of 0.934) for DLB versus CN; and 87.4% accuracy (AUC of 0.833) for AD versus DLB (Table 3). The three-class classification model for AD, DLB, and CN groups achieved 79.9% accuracy (AUC of 0.873).

TABLE 3 Performance of classification models using speech features

	Mean [95% CI]
	Accuracy (%)	AUC	Sensitivity (%)	Specificity (%)	F1 score (%)
AD vs. CN	87.0 [86.5, 87.5]	0.851 [0.849, 0.853]	86.4 [85.9, 86.9]	87.6 [86.7, 88.4]	86.4 [86.0, 86.9]
DLB vs. CN	93.2 [92.8, 93.6]	0.934 [0.931, 0.937]	88.9 [88.9, 88.9]	95.5 [94.9, 96.1]	90.2 [89.7, 90.7]
AD vs. DLB	87.4 [86.4, 88.3]	0.833 [0.830, 0.836]	N/A	N/A	N/A

Abbreviations: CI, confidence interval; AUC, the area under receiver operating characteristic curve; AD, Alzheimer's disease; DLB, dementia with Lewy bodies; CN, cognitively normal.

To better understand how the models could classify clinical diagnostic groups, we investigated the association between cognitive scores and output measures of each binary-classification model using the Spearman's rank correlation coefficient (ρ). Consequently, the model output measures showed the highest correlation with the Logical Memory-delayed scores ( $| \rho |$ = 0.47, P < 0.0001; Table 4) in the model for AD versus CN, with the Trail Making Test part-B scores ( $| \rho |\;$ = 0.51, P < 0.0001; Table 4) in the model for DLB versus CN, and with the Trail Making Test part-A scores ( $| \rho |\;$ = 0.46, P < 0.0001; Table 4) in the model for AD versus DLB.

TABLE 4 The Spearman's rank correlation coefficient (ρ) of cognitive scores and output measures of binary-classification models of AD, DLB, and CN groups, ordered by the absolute correlation coefficients

AD vs. CN			DLB vs. CN			AD vs. DLB
Cognitive measure	*\|ρ\|*	P value	Cognitive measure	*\|ρ\|*	P value	Cognitive measure	*\|ρ\|*	P value
Logical Memory-delayed	0.47	<0.001	Trail Making Test part-B	0.51	<0.001	Trail Making Test part-A	0.46	<0.001
Mini-Mental State Examination	0.44	<0.001	Trail Making Test part-A	0.37	0.001	Trail Making Test part-B	0.27	0.026
Logical Memory-immediate	0.43	<0.001	Logical Memory-immediate	0.33	0.003	Logical Memory-delayed	0.19	0.105
Trail Making Test part-B	0.33	0.001	Logical Memory-delayed	0.32	0.005	Frontal Assessment Battery	0.12	0.296
Trail Making Test part-A	0.25	0.014	Frontal Assessment Battery	0.21	0.062	Logical Memory-immediate	0.09	0.461
Frontal Assessment Battery	0.24	0.019	Mini-Mental State Examination	0.13	0.275	Mini-Mental State Examination	0.03	0.797
Clock Drawing Test	0.19	0.075	Clock Drawing Test	0.11	0.350	Clock Drawing Test	0.01	0.945

Abbreviations: AD, Alzheimer's disease; DLB, dementia with Lewy bodies; CN, cognitively normal.

DISCUSSION

We investigated speech features characterizing linguistic, prosodic, and acoustic aspects by using data collected from 121 participants in the AD, DLB, and CN groups, and obtained two main findings. First, a statistical analysis showed that the AD group showed larger differences from the CN group than the DLB group in linguistic features, with reductions in informative contents and semantic verbal fluency, while the DLB group sohowed larger differences in prosodic and acoustic features, with reduced phoneme rate and increased pause proportion, less inflections (i.e., monotony and dullness of speech in clinical descriptions), and increased variabilities in voice frequency and amplitude. Second, the combination of these speech features could identify and differentiate AD and DLB by capturing impairments in different cognitive measures: Logical Memory-delayed scores for AD versus CN, Trail Making Test part-B scores for DLB versus CN, and Trail Making Test part-A scores for AD versus DLB. To the best of our knowledge, this is the first study to identify discriminative patterns in speech and language impairments of AD and DLB and to demonstrate the feasibility of using them to identify and differentiate AD and DLB.

The profile of language and speech impairments were different in the AD and DLB groups, with larger differences from the CN group in the linguistic features of the AD group and in the prosodic and acoustic features of the DLB group. The trends in these differences in each dementia type were consistent with those reported in previous studies.^{8,16,24,28,36} Hence, one of our contributions lies in providing empirical evidence of how these differences differ in AD and DLB via direct comparisons using speech analysis. Furthermore, to the best of osiur knowledge, this is the first study to suggest the usefulness of shimmer and jitter to differentiate DLB from CN and AD. Numerous studies on speech of Parkinson's disease patients have shown increases in shimmer and jitter,^23–26 which were also suggested to further increase as the disease progresses.²⁶ These differences are thought to be due in part to motor impairments including deteriorated control of respiratory and laryngeal muscles.⁴³ Comparing these acoustic features across Lewy body disorders may provide useful insights into neuropathological mechanisms underlying the speech impairments.²³

The results of models using speech features could achieve a high performance for identifying and differentiating the AD and DLB groups. Our models differentiated the AD and DLB groups from the CN group primarily by capturing impairments in Logical Memory-delayed scores and Trail Making Test part-B scores, respectively. While AD is characterized clinically by prominent memory impairment,¹⁹ DLB is characterized by more executive, attentional, and visuospatial impairment relative to memory impairment,¹⁹ which may support the relevance of our results. Aligning with studies on gait and balance analysis for differentiating dementia types,^44–46 our results suggest the possibility of using speech and language characteristics as behavioral markers, reflecting dementia-type-specific cognitive profiles and underlying pathologies. To confirm this suggestion, we will need further study with validated neuropathological biomarkers. From a clinical perspective, a classification model using speech analysis may assist clinicians in differential diagnosis as a screening tool because speech data can be acquired in routine clinical practice. In fact, a number of studies on speech analysis for data collected during neuropsychological assessment have succeeded in detecting MCI and dementia,^{14,16,47–49} and several of them have shown that the addition of speech analysis to the neuropsychological assessment has been found to improve its accuracy.^48,49 Another clinical implication of this study is that our findings using the tablet-based application might help develop a self-administrated tool for the early detection of AD and DLB. According to the World Alzheimer Report published in 2021,⁵⁰ 83% of clinicians maintain that the COVID-19 pandemic has delayed access to diagnostic assessments. Thus, a self-administered screening tool may be especially important in the current COVID-19 pandemic given the difficulties of an in-person evaluation in a clinical setting. In our future research, we will endeavor to investigate the operability and acceptability of real-world data collection.

There were several limitations in this study. First, our analysis did not include neuropathological changes using validated biomarkers or postmortem follow-up. A significant proportion of dementia patients have mixed Lewy body and AD pathological changes.² Because we diagnosed DLB with clinical diagnostic criteria, we may have included DLB patients with mixed pathology. The association of speech and language characteristics with neuropathological changes warrants further research. Second, our dataset was a relatively small sample size and was imbalanced among the clinical diagnostic groups. The generality of our findings should be confirmed with larger samples collected in multiple sites. Third, residual confounding can still exist, and matching would have strengthened our findings. The DLB group had higher Mini-Mental State Examination scores than the AD group. Finally, we collected the speech data in a lab setting, and the controlled setting might have influenced how the participants responded to questions.

In summary, our results provide initial, empirical evidence of (1) the discriminative differences in speech features characterizing acoustic, prosodic, and linguistic aspects in AD and DLB, and (2) the feasibility of using these speech features as a screening tool for identifying and differentiating AD and DLB. Our findings require further validation with neuropathological biomarkers or postmortem follow-up.

ACKNOWLEDGMENTS

This work was supported by the Japan Society for the Promotion of Science, KAKENHI (grant number 19H01084). The funders did not play any active role in the scientific investigation and reporting of the study.

CONFLICT OF INTEREST

YY is employed by the IBM Corporation.

KS is employed by the IBM Corporation.

MN received funding from the Japan Society for the Promotion of Science, KAKENHI (grant number 19H01084).

MO has nothing to disclose.

KN has nothing to disclose.

TA received funding from Japan Society for the Promotion of Science, KAKENHI (grant number 19H01084). TA reports honoraria for lectures from Eisai, Daiichi-Sankyo, and Sumitomo Pharma.

Author disclosures are available in the Supporting Information.

Word count: 5071

Show less

© 2022. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Introduction

Early differential diagnosis of Alzheimer's disease (AD) and dementia with Lewy bodies (DLB) is important, but it remains challenging. Different profiles of speech and language impairments between AD and DLB have been suggested, but direct comparisons have not been investigated.

Methods

We collected speech responses from 121 older adults comprising AD, DLB, and cognitively normal (CN) groups and investigated their acoustic, prosodic, and linguistic features.

Results

The AD group showed larger differences from the CN group than the DLB group in linguistic features, while the DLB group showed larger differences in prosodic and acoustic features. Machine-learning classifiers using these speech features achieved 87.0% accuracy for AD versus CN, 93.2% for DLB versus CN, and 87.4% for AD versus DLB.

Discussion

Our findings indicate the discriminative differences in speech features in AD and DLB and the feasibility of using these features in combination as a screening tool for identifying/differentiating AD and DLB.

Details

Title

Speech and language characteristics differentiate Alzheimer's disease and dementia with Lewy bodies

Author

Yamada, Yasunori¹

; Shinkawa, Kaoru¹

; Nemoto, Miyuki²

; Ota, Miho²

; Nemoto, Kiyotaka²

; Arai, Tetsuaki²

¹ Digital Health, IBM Research, Chuo-ku, Tokyo, Japan
² Department of Psychiatry, Division of Clinical Medicine, Faculty of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan

Section

DIAGNOSTIC AND PROGNOSTIC ASSESSMENT

Publication year

2022

Publication date

2022

Publisher

John Wiley & Sons, Inc.

e-ISSN

23528729

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1002/dad2.12364

ProQuest document ID

2758363475

Speech and language characteristics differentiate Alzheimer's disease and dementia with Lewy bodies

Jump to:

Full text

Abstract

Details

Suggested sources