Content area
Full Text
This study examined word level intelligibility differences between DECTalk and MacinTalk speech synthesizers using the Modified Rhyme Test in an open format transcription task. Three groups of listeners participated: inexperienced, speechlanguage pathologists, and speech synthesis experts. Results for between-subjects ANOVA showed that the expert group correctly identified a significantly higher number of words than each of the other listener groups. For the within-subjects factor of voice, simple effects ANOVA and post hoc contrasts within each group showed that listeners had higher intelligibility scores for the DECTalk male voice, Perfect Paul, than for the MacinTalk male voice, Bruce. No other pairwise gender/age-matched differences were found between the two synthesizers.
KEY WORDS: speech synthesis, DECTalk, MacinTalk, intelligibility, listening experience
During the last two decades important advances have been made in augmentative/alternative communication (AAC) technology resulting in electronic communication systems that are more sophisticated and more easily accessed by people with disabilities. Improvements in the quality of voice output allow AAC users to communicate independently in a variety of communicative contexts. Many AAC systems housed in personal computers are now available with built-in software for synthesized speech. This has several advantages for the user including lower cost and increased portability.
At the present time, a formant-based speech synthesis system, DECTalkTM (Klatt, 1980; Klatt & Klatt, 1990), is the most widely used speech synthesizer in AAC technology. Studies have shown that DECTalk is the most intelligible speech synthesizer at the word level (Greene, Logan, & Pisoni, 1986; Logan, Greene, & Pisoni, 1989; Mirenda 8 Beukelman, 1987) and at the sentence level (Mirenda & Beukelman, 1987; Scherz & Beer, 1995). AAC developers have imported DECTalk voices into newer systems because of this high level of intelligibility.
The algorithm employed in DECTalk is based on detailed theoretical foundations from the acoustic theory of speech production (Fant, 1960). For example, DECTalk employs two different sound sources, one for voicing and one for noise. In addition, two sets of resonators are used, a serial configuration for vowels and a parallel configuration for fricatives. In all, 39 different parameters are configured in DECTalk, and they are updated every 5 milliseconds. Extensive language-specific pronunciation rules as well as a dictionary of exceptions increase the likelihood that messages entered as text are spoken correctly. Ten...