Content area

Abstract

Can we rebuild the bridge between brain and voice, restoring human communication for people with paralysis? This thesis outlines our translational systems that restore speech to individuals with vocal-tract paralysis.

Speech neuroprostheses have the potential to restore communication and embodiment to individuals living with paralysis, but achieving naturalistic speed and expressivity has remained elusive. The advances presented in this thesis enabled a clinical trial participant with severe limb and vocal paralysis to "speak again" for the first time in 18+ years using an AI "brain-to-voice" decoder that restores their pre-injury voice. We use high-density surface recordings of the speech cortex in a participant to achieve high-performance, large-vocabulary, real-time decoding across three complementary speech-related output modalities: text, speech audio, and facial-avatar animation. Leveraging advances in machine learning for automatic speech recognition and synthesis, we trained and evaluated deep-learning models using neural data collected as participants attempted to silently speak a sentence, enabling decoding speeds approaching natural conversational rates. We also demonstrate the control of virtual orofacial movements for speech and non-speech communicative gestures via a high-fidelity "digital talking avatar" controlled by the participant’s brain.

Building on the above advances in high-performance brain-to-speech decoding, I outline our findings demonstrating low-latency, continuously streaming brain-to-voice synthesis with neural decoding in 80-ms increments. The recurrent neural network transducer models demonstrated implicit speech detection capabilities and could continuously decode speech indefinitely, enabling uninterrupted use of the decoder and further increasing speed. Our framework also successfully generalized to other silent-speech interfaces, including single-unit recordings and electromyography.

Together, the findings in this thesis introduce a multimodal, low-latency speech-neuroprosthetic approach with substantial promise for restoring full, embodied communication to people with severe paralysis. 

A video overview of our brain decoding technique and impact can be found at this link.

Details

1010268
Business indexing term
Title
AI-Driven Speech Neuroprostheses for Restoring Naturalistic Communication and Embodiment
Number of pages
107
Publication year
2025
Degree date
2025
School code
0028
Source
DAI-B 87/3(E), Dissertation Abstracts International
ISBN
9798293892686
Committee member
Khanna, Preeya; Ng, Ren
University/institution
University of California, Berkeley
Department
Electrical Engineering & Computer Sciences
University location
United States -- California
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32173285
ProQuest document ID
3256768803
Document URL
https://www.proquest.com/dissertations-theses/ai-driven-speech-neuroprostheses-restoring/docview/3256768803/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic