Content area

Abstract

Researchers have recently been pursuing technologies for universal speech recognition and interaction that can work well with subtle sounds or noisy environments. Multichannel acoustic sensors can improve the accuracy of recognition of sound but lead to large devices that cannot be worn. To solve this problem, we propose a graphene-based intelligent, wearable artificial throat (AT) that is sensitive to human speech and vocalization-related motions. Its perception of the mixed modalities of acoustic signals and mechanical motions enables the AT to acquire signals with a low fundamental frequency while remaining noise resistant. The experimental results showed that the mixed-modality AT can detect basic speech elements (phonemes, tones and words) with an average accuracy of 99.05%. We further demonstrated its interactive applications for speech recognition and voice reproduction for the vocally disabled. It was able to recognize everyday words vaguely spoken by a patient with laryngectomy with an accuracy of over 90% through an ensemble AI model. The recognized content was synthesized into speech and played on the AT to rehabilitate the capability of the patient for vocalization. Its feasible fabrication process, stable performance, resistance to noise and integrated vocalization make the AT a promising tool for next-generation speech recognition and interaction systems.

The mechanical signals of the laryngeal vocal organ have not been well utilized by human speech processing technology. The authors develop a prototype of a wearable artificial throat that can sense speech- and vocalization-related actions. The results suggest a new technological pathway for speech recognition and interaction systems.

Details

Title
Mixed-modality speech recognition and interaction using a wearable artificial throat
Author
Yang, Qisheng 1   VIAFID ORCID Logo  ; Jin, Weiqiu 2 ; Zhang, Qihang 1 ; Wei, Yuhong 1 ; Guo, Zhanfeng 1 ; Li, Xiaoshi 1   VIAFID ORCID Logo  ; Yang, Yi 1   VIAFID ORCID Logo  ; Luo, Qingquan 2   VIAFID ORCID Logo  ; Tian, He 1   VIAFID ORCID Logo  ; Ren, Tian-Ling 1   VIAFID ORCID Logo 

 Tsinghua University, School of Integrated Circuits and Beijing National Research on Information Science and Technology (BNRist), Beijing, China (GRID:grid.12527.33) (ISNI:0000 0001 0662 3178) 
 Shanghai Jiao Tong University, Shanghai Lung Cancer Center, Shanghai Chest Hospital, Shanghai, China (GRID:grid.16821.3c) (ISNI:0000 0004 0368 8293); Shanghai Jiao Tong University, School of Medicine, Shanghai, China (GRID:grid.16821.3c) (ISNI:0000 0004 0368 8293) 
Pages
169-180
Publication year
2023
Publication date
Feb 2023
Publisher
Nature Publishing Group
e-ISSN
25225839
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2779286015
Copyright
© The Author(s), under exclusive licence to Springer Nature Limited 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.