Abstract

Visual perception, speech perception and the understanding of perceived information are linked through complex mental processes. Gestures, as part of visual perception and synchronized with verbal information, are a key concept of human social interaction. Even when there is no physical contact (e.g., a phone conversation), humans still tend to express meaning through movement. Embodied conversational agents (ECAs), as well as humanoid robots, are visual recreations of humans and are thus expected to be able to perform similar behaviour in communication. The behaviour generation system proposed in this paper is able to specify expressive behaviour strongly resembling natural movement performed within social interaction. The system is TTS-driven and fused with the time-and-space efficient TTS-engine, called ‘PLATTOS’. Visual content and content presentation is formulated based on several linguistic features that are extrapolated from arbitrary input text sequences and prosodic features (e.g., pitch, intonation, stress, emphasis, etc.), as predicted by several verbal modules in the system. According to the evaluation results, when using the proposed system the synchronized co-verbal behaviour can be recreated with a very high-degree of naturalness, either by ECAs or humanoid robots alike.

Details

Title
TTS-Driven Synthetic Behaviour-Generation Model for Artificial Bodies
Author
Mlakar, Izidor 1 ; Kačič, Zdravko 2 ; Rojc, Matej 2 

 Humatronik d.o.o., Slovenia 
 University of Maribor, Faculty of Electrical Engineering and Computer Science, Slovenia 
Publication year
2013
Publication date
Oct 2013
Publisher
Sage Publications Ltd.
ISSN
17298806
e-ISSN
17298814
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2324875320
Copyright
© 2013. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.