Stable Decoding from a Speech BCI Enables Control

Full text

Turn on search term navigation

Introduction

Brain-computer interfaces (BCIs) enable the control of devices and software applications by interpreting the user's intent from recorded brain signals.^[^] BCIs may be used by individuals with severe paralysis to complement or replace their existing communication abilities and/or to control devices in their environment.^[^] Under these circumstances, ideally a BCI user can issue commands whenever needed, without external cueing or assistance, and with high confidence that the command will be executed.^[^]

Recently, there has been a dramatic emergence of speech BCI capabilities, with audio waveforms and entire sentences translated directly from intracranial recordings.^[^] These techniques offer a promising path toward restoring communication, but are not intended for device control, a crucial need among individuals who are severely paralyzed.^[^] Unlike communication BCIs, BCIs for device control must provide low-latency, reliable commands with minimal linguistic context or assistance from language models. Moreover, the use of BCIs to interface with physical devices (e.g., thermostats, robotic assistants) arguably requires even higher decoding performance and stability over time for safety reasons.

Implantable BCIs have enabled increasingly sophisticated functionality to users, but it is challenging to maintain high performance over long periods of time without the need to retrain or recalibrate, which can be time-consuming and require intervention by a research team or a caregiver.^[^] One study has demonstrated that time spent recalibrating can be reduced to as little as 2 min for baseline collection in a cursor-control task, proving the feasibility of a ‘plug-and-play’ BCI.^[^] However, it is not clear if long-term decoding without retraining and recalibration is still feasible for speech BCI. It also remains unknown whether daily baseline data collection, as was necessary for the aforementioned study, can be eliminated altogether, which would give more autonomy to BCI users and allow for broader BCI usage.

Here, we present a ‘plug-and-play’ BCI control system based on real-time decoding of speech-related neural activity from a chronic electrocorticographic (ECoG) implant. Using this BCI system, a study participant with amyotrophic lateral sclerosis (ALS) was able to freely generate a set of control commands (i.e., up, down, left, right, enter, and back). Commands were reliably detected and decoded as the participant navigated a communication board and controlled devices like room lights and streaming TV applications. We found that decoder retraining and recalibration, as well as baseline collection before each session, were not required to sustain high performance over a 3-month study period following initial laboratory calibration. Together, these results provide evidence that an implanted BCI system decoding speech commands can enable a reliable and stable means of controlling a computer and other external devices over a period of several months for individuals with motor speech impairments due to neurological disorders such as ALS.

Results

Real-Time Neural Decoding

A participant in the CortiCom clinical trial (ClinicalTrials.gov Identifier: NCT03567213; see for details) with severe dysarthria due to ALS was able to control external devices (Video , Supporting Information, the participant gave informed consent for the release of video recordings) and navigate a 4 × 8 communication board (Figure ; Video , Supporting Information) using a BCI in real-time. No device-related adverse events or serious adverse events have occurred to date. The investigational BCI device is still operating nominally at the time of the reporting, acquiring good-quality ECoG signals in all but four of the 128 ECoG electrodes. Two 64-channel high-density ECoG arrays were implanted over motor and somatosensory cortical areas (Figures and ). The lateral array primarily covered brain regions responsible for speech-related functions; this array alone was used to decode speech commands in this study. Raw ECoG signals were bandpass filtered between 70 and 170 Hz to estimate high gamma energy (HGE), which has been shown to correlate with neuronal population activity underlying the electrodes^[^] and used widely to decode speech from ECoG signals.^[^] The event-related increases in HGE were used by the BCI system to determine whether a command had been issued by the user (Figure ). Specifically, the time associated with the local maximum of a 1-s rolling average of channel-averaged HGE was identified. Once a speech event was detected, we classified neural features in a window consisting of HGE 2 s prior to peak detection and 0.5 s after peak detection using a convolutional neural network (CNN, InceptionTime,^[^] Figure ). Visual feedback was presented to the participant as soon as decoding results were received by the system. In the communication board navigation task, the participant could see a red highlight moving in cardinal directions when the decoding results were up, down, left, or right, respectively. This red highlight turned green if the command enter was decoded. When a back command was decoded, a yellow highlight replaced any existing highlight, which indicated that the current option had been unselected.

[IMAGE OMITTED. SEE PDF]

The CNN decoding model was trained on data collected during a word production task where the subject was instructed to read the six commands as they appeared on the screen. Training data collection for this task began and concluded 77 and 120 days post-implant (4 and 3 months prior to real-time usage), respectively. To accommodate for the usage of the system without recalibration, all data were normalized using the mean and standard deviation of the silence period (0.8 to 0 s prior to stimulus onset) in a syllable repetition task (Note , Supporting Information) collected from a single, arbitrarily chosen day (95 days post-implant) in the aforementioned time frame.

Stable Decoder Performance Over Three Months

Real-time testing began 194 days after implantation, at which point all model parameters were fixed and no retraining of the decoding model took place. On each day of testing, the participant was instructed to issue verbal commands (up, down, left, right, enter, and back) at his own pace to navigate across the communication board toward targets of his choice. HGE features were normalized with statistics collected on Day 95 post-implant. No separate baseline collection/model recalibration took place prior to real-time experiments. We report here performance statistics in a 3-month study period, concluding 285 days post-implant (n = 35 sessions in total). For the first 2 days of real-time usage, two sessions were conducted each day. All other days had a single session lasting just under 5 min on average (Table , Supporting Information).

As all commands were selected by the participant overtly in real time, audio recordings of the participant during online usage were transcribed as ground truth. We define online accuracy as the percentage of real-time classification results matching the transcriptions when a command was indeed issued by the participant. The participant achieved a median accuracy of 90.59% (95% CI: [89.47%, 92.00%], Figure ). Performance held steady across the study span, with no significant relationship between online accuracy and days after implant (y = 0.010x + 88.70, where x is the number of days after implant (same below), R² = 0.006, p = 0.65, Figure ). The median correct decodes per minute across sessions was 14.9 (95% CI: [14.0, 15.3], Figure , the median for commands issued per minute was 16.49), and its relationship with days after implant also could not be established (y = −0.001x + 15.11, R² = 0.001, p = 0.88, Figure ). These findings indicate the stability of our decoder even without retraining or dedicated baseline recalibration.

[IMAGE OMITTED. SEE PDF]

Separately, we measured the performance metrics of our detection algorithm (Figure ). False detections were defined as speech events detected by the algorithm while the participant did not attempt to say a word. Missed detections were defined as instances where the participant finished a full word that the detection algorithm failed to detect. Across the study period, both the false detection rate and missed detection rate remained low. The median false detection rate was 0.19 min⁻¹(95% CI [0.00, 0.23], y = 0.001x − 0.027, R² = 0.017, p = 0.45), and the median missed detection rate was 0.19 min⁻¹ (95% CI [0.00, 0.20], y = 0.002x − 0.208, R² = 0.066, p = 0.14). We found no statistically significant linear trend between either of these metrics and days after implantation. Additionally, the median time interval between speech offset and when the decoding results were registered by the BCI system was 1.24 s (95% CI: [1.23, 1.25], Figure ). This response speed represented the delay of the system between when the participant issued the instruction and when the system completed the corresponding action.

Stability of the Decoding Signals

To quantify the stability of the underlying neural signals, we investigated both the neural features used for the training of the decoding model and those collected during the real-time testing phase. Figure displays the HGE over a time period of 4 s, starting from 1 s prior to speech onset, for two example electrodes (location shown in Figure ). Here, HGE is aggregated by month when using the communication board online, or from the collection of training data—colorized for each case differently. We saw similar patterns of event-related HGE increase during the training data collection phase and each month that the BCI system was used in real-time (Figure ; Figures and , Supporting Information).

[IMAGE OMITTED. SEE PDF]

We then compared the similarity between the time series (−1.0 to 1.5 s relative to speech onset) of raw event-related HGE during the model training phase and each day of real-time usage. Figure reports Pearson's correlation coefficients for each channel across days. Despite the variance of correlation values across channels, the correlation pattern between neural activity during training and real-time use was relatively stable. A small increasing trend was observed for the channel average (y = 0.001x + 0.410, R² = 0.183, p <0.05, Figure ). For n = 37/60 channels, a small (slope <0.001 day⁻¹) but statistically significant (p <0.05) increase in correlation scores over time was observed (Figure ). For n = 23/60 channels, no significant relationship between correlation coefficients and days after implant could be established (Figure ). These findings suggest that the neural signals maintained their relative similarity to the training data during the course of real-time usage.

Lastly, we monitored the stability of the neural signals from the perspective of raw HGE. We computed the average HGE across channels for each command (−1.0 to 1.5 s relative to speech onset) during each day of online usage (Figure ). We observed no statistically significant (p < 0.05) relationship between command-specific HGE and days after implant (Figure ).

Electrode Contribution

Next, we examined that electrodes among those selected made the strongest contribution to decoding performance and stability. We first tested whether similar decoding performance could have been achieved had the ECoG grid only covered either motor (including premotor) or sensory cortices (pre-versus post-central areas). We simulated real-time usage of a motor-only and a sensory-only model using neural activity data from online sessions, detected from the aforementioned methods using all 60 electrodes. Lower accuracies were observed in both motor-only (median: 81.33%, 95% CI [79.07%, 83.33%]) and sensory-only (median: 70.67%, 95% CI [66.67%, 73.49%]) conditions (Figure , p < 0.0001, Mann–Whitney-Wilcoxon test, two-sided with Bonferroni correction for six comparisons). As with the full model, no statistically significant trend between decoding accuracies and days from implantation could be established (y₁ = −0.058 x + 95.186, R² = 0.109, p = 0.06; y₂ = −0.016 + 74.159, R² = 0.004, p = 0.73 where y₁ and y₂ are accuracies with models trained with sensory and motor electrodes excluded, respectively; Figure , Supporting Information). These findings suggest that the wide coverage of the ECoG grid might have been necessary to achieve the high performance we observed, though performance stability did not seem to be affected by reduced coverage.

[IMAGE OMITTED. SEE PDF]

We then took a more granular look at which specific electrodes had the greatest influence on decoding. Each electrode's influence on decoding was quantified as relative changes to model predictions based on small perturbations to the neural activity from that electrode.^[^] The dorsal and posterior parts of the grid had more influence on the decoding model than the ventral and anterior parts of the grid. The most influential electrodes were localized to the dorsal part of the ventral sensorimotor cortex (vSMC), which has been associated with lip movement, and to a lesser extent, to areas associated with tongue and jaw movement.^[^] We then investigated whether this spatial pattern of electrode influence was stable across the study period (Figure ). Strong correlations (Pearson's correlation coefficient, r = 0.985 between 1 and 2 months, r = 0.994 between 1 and 3 months, r = 0.992 between 2 and 3 months, p < 0.0001 for all three pairs) between the influence of the electrodes across the three real-time usage months were observed.

Generalizing Stable Performance to Functional Control and Silent Speech

Since many aspects of the decoding system design were motivated by the need for ALS patients to interface with computational devices and external hardware, we examined whether the stability and performance of a fixed decoder could still hold under hardware-control conditions. In these functional control experiments, options on the communication board were linked with real-world events (e.g., turning a light on and off, contacting caregivers, playing the radio, etc.). The participant was also given the ability to control a TV application on a separate screen (Video , Supporting Information), which also consisted of 2-D arrays of menu options that were navigated using the same six commands. Online functional control experiments started 266 days post-implant and concluded 285 days after implant (coinciding with the last month of online communication board usage experiments). The same fixed decoder as described above was used without any further training or recalibration. The participant was instructed to finish a series of functional tasks at his own pace in the following order: activate the BCI system (by issuing three enter commands in a row), turn off a lamp, turn on the radio, turn off the radio, activate smart TV control, open a video application, and select a video to watch. Median accuracy across the nine functional control sessions was 86.49% (95% CI: [80.00%, 96.30%]). No statistically significant relationship between functional control accuracy and days after implant could be established (y = −0.450 x + 212.21, R² = 0.278, p = 0.15, Figure ). The median for correct decodes per minute was 9.8 (95% CI [8.2, 10.7]), with no significant relationship with days after implant (y = 0.018 x + 4.975, R² = 0.016, p = 0.74, Figure ). Note that this number was substantially lower than the correct decodes per minute reported earlier for communication board control. This was mainly because the participant required more time to examine their options, and to a lesser extent, wait for control commands to take effect in the real world. The median false detection rate was 0.71 min⁻¹(95% CI [0.35, 1.24]), and median missed detection rate was 0 min⁻¹ (95% CI [0.00, 0.35]). No statistically significant relationship between either metric and days after implant could be established (y₁ = −0.061 x + 17.543, R² = 0.405, p = 0.06; y₂ = 0.017 x − 4.494, R² = 0.406, p = 0.06, where y₁ is false detection rate and y₂ missed detection rate, Figure ). Together, these findings suggest that the performance and stability of our BCI system also apply to functional control under simulated real-life settings in laboratory.

[IMAGE OMITTED. SEE PDF]

We then examined whether stable performance could be achieved if no audible speech was produced. A separate decoder was trained using data collected in a word production task where the participant silently mimed the words shown on the screen. Data collection for the training of the silent speech decoder started and concluded 83 and 225 days after implant, respectively, across eight different days. The participant was instructed to silently move his articulators and facial muscles in these sessions; all other parameters of the task paradigm were kept consistent with the aforementioned word production task. Online usage of this new decoder started and concluded 236 and 313 days after implant, respectively. In these experiments, a cross occurred on the screen for 2 s prompting the participant to give a silent command, chosen freely by him. The decoding window started when the cross disappeared and ended 2.5 s after. Once the cross disappeared, the user interface displayed the same 4 × 8 communication board as referenced above in the real-time communication board usage experiments. The participant was instructed to issue commands once the cross disappeared. The lip movements of the participant were transcribed as ground truth. Once the decoding result was received by the user interface, the highlight on the communication board changed accordingly and the results were presented to the subject for a further 1.5 s before the next cross replaced the communication board display. Median accuracy across the eight silent speech control sessions was 85.18% (95% CI: [76.79%, 93.75%]). No linear trend of decoding accuracy could be established across days after implant (y = −0.010 x + 88.437, R² = 0.002, p = 0.92, Figure ). The median for the number of correct decodes per minute was 8.16 (95% CI [7.2, 8.5], Figure ). These results indicate that BCI stability can also be achieved in the absence of phonation.

Discussion

In this study, we demonstrated the accuracy and stability of a speech BCI system based on a chronic ECoG implant. No serious adverse events or device-related adverse events occurred during the study period. At the time of this report, the BCI system is still operating nominally, capable of streaming all but 4 of 128 ECoG signals. A clinical-trial participant was able to use the system at his own pace to control computer applications and external devices over a 3-month study period without model recalibration or retraining. Previous demonstrations of implanted speech BCIs have focused primarily on the restoration of communication abilities for participants by translating brain activity into text.^[^] An outstanding question is whether decoding neural activities during speech commands can also be used to enable the direct control of devices, another crucial need among severely paralyzed individuals. In a control-oriented BCI, similar to voice assistants used by able-bodied individuals, each user-issued command needs to be detected and classified with high confidence. Otherwise, BCI users may elect to abandon device use.^[^] Here, we show that a speech BCI using a small set of intuitive commands can indeed achieve a high accuracy for performant navigation of a communication board, as well as to control household devices, without the use of a language model to correct for decoding errors.

Another outstanding question for BCI development is whether day-to-day setup time associated with retraining and baseline collection can be reduced while still maintaining robust performance.^[^] This problem becomes more acute for control-focused BCIs, as their ultimate purpose is to facilitate the independent use of assistive and other smart devices at home. Individuals with severe motor impairments need to be able to reliably control BCI systems whenever the need arises. Previous studies have already demonstrated that the neural representations responsible for consistent motor behaviors that underlie BCI control are stable.^[^] A few recent studies have reduced the setup time for BCI systems to as little as 2 min.^[^] Our study builds on these prior works by eliminating the model retraining and baseline recalibration steps altogether, marking a critical step toward the independent usage of a speech BCI for navigational control without the need for ongoing researcher intervention.

What drove the decoding stability in our study? We believe the stability of high gamma responses recorded from the implanted ECoG arrays had an important impact on decoding stability. Previous studies have demonstrated short- and long-term stability of ECoG implants in both humans^[^] and non-human primates.^[^] In this study, we extended these findings on chronic ECoG decoding stability to speech BCI in an ALS participant. Our results add to the growing evidence that high gamma responses are highly informative for speech-related motor behaviors,^[^] and that they are stable, likely because they reflect the collective firing rates of neuronal populations.^[^] Further studies are needed to explore in detail the stability of these and other spectral features of ECoG signals.

One limitation of our approach was the limited vocabulary used for speech decoding. Although the six commands we adopted in this study were both intuitive and sufficient for controlling grid-based applications, a more comprehensive vocabulary may reduce the time needed to perform each selection. Previous reports on speech BCIs have demonstrated that a higher number of decoding classes can still be accurately decoded.^[^] The success of speech synthesis directly from neural recording also suggests the potential for decoding a larger vocabulary.^[^] The current study was conducted with only a single participant in a phase I clinical trial of the safety and feasibility of an ECoG-based BCI device. Few studies have explicitly tested the effectiveness of an ECoG-based speech BCI in people living with ALS.^[^] Further studies are needed to verify if our proposed approach will generalize to other participants with similar conditions. Even though we explored how various other popular BCI approaches would perform in offline simulations, we did not have the opportunity to establish how those approaches would perform in the same online close-loop experiments for longer periods of time. For example, the addition of lower-frequency information or analytic amplitude extraction of high gamma could have potentially resulted in better online decoding accuracy. Our participant had severe dysarthria due to bulbar dysfunction, with limited intelligibility. Nevertheless, he was still capable of phonation and articulation. Even though we were able to achieve high performance with silent speech, it still remains to be seen whether the same level of performance can be achieved in people living with ALS who are unable to phonate and/or articulate.

Conclusion

Overall, our work demonstrates the potential of safely using implanted BCIs for intuitive control of external devices for a prolonged time. In future studies, it might be possible to extend the capability of decoding systems that require no recalibration to longer and more diverse voice commands. In a home-use BCI system, stable decoders such as ours can also be used to initiate calibration sessions for more sophisticated, albeit less stable, decoders. Utilizing the signal stability afforded by intracortical ECoG recordings, our results may be one of the first steps in realizing the potential for independent home use of speech BCIs by individuals with severe paralysis.

Experimental Section

Clinical Trial and Participant

This study included data from one clinical trial participant (ClinicalTrials.gov Identifier: NCT03567213) who gave written informed consent for implantation of the study device and all research activities associated with this study. The use of the device in this study was approved by the US Food and Drug Administration under an Investigational Device Exemption. The study protocol was approved by the Johns Hopkins Medicine Institutional Review Board (IRB00167247). The participant was a right-handed man who was 61 years old at the time of implant. He was diagnosed with ALS ≈8 years prior to the start of the study. The participant had severe progressive dysarthria and dysphagia as a result of bulbar dysfunction. This had also been accompanied by progressive dyspnea. His residual phonation and articulation ability could still support overt speech, albeit with reduced speed and limited intelligibility. In addition, he had experienced progressive upper limb weakness, but his lower limbs were less affected.

Safety and Signal Monitoring

The primary outcome measures of the clinical trial described here were designed to establish the safety and recording viability of the investigational BCI device. The safety outcome was measured by time to device explanation and was considered a success if the device was not explanted due to safety concerns during the study period. At the time of this reporting, the device was implanted for 52 weeks. Safety monitoring of the device throughout the study period had consisted of visual inspections of the device, review of streamed ECoG signals, assessments of vital signs, mood, and cognition at every study visit, as well as monthly physical, neurological, and cognitive examinations. No serious adverse events or device-related adverse events had been reported to date. The recording viability of the study device was measured as the number of usable neural signals. All but 4 of the 128 ECoG electrodes had produced usable neural signals throughout the study duration to date. Three electrodes with consistently high impedance (>15kΩ) were excluded from the analysis. Impedances for the rest of the electrodes remained under 15kΩ throughout the study period. One additional electrode was excluded due to inconsistent signal quality confirmed with visual inspection of the raw signals. On a few occasions, raw signal recorded from that channel did not exhibit nominal patterns of brain waves and remained lower in amplitude than normal. ECoG signals from the remaining 60 electrodes of the ECoG array implanted over ventral sensorimotor cortex were used for training and testing decoding algorithms.

Surgical Implantation

Implantation of the investigational device occurred in July 2022 at the Johns Hopkins Hospital without surgical complications. Two 64-channel ECoG grids (PMT Corporation, Chanhassen, MN) were implanted subdurally on the pial surface of the brain, over brain regions responsible for speech and upper extremity movements. Exact placement was informed by anatomical landmarks, preoperative fMRI, and somatosensory evoked potentials. Each ECoG grid had a surface area of 12.11 cm² (36.66 mm x 33.1 mm) with an 8 × 8 electrode configuration and 4 mm center-to-center spacing embedded in soft silastic sheets. Each platinum-iridium disc electrode had a thickness of 0.76 mm and an exposed surface diameter of 2 mm. Two wires implanted over the surface of the grids served as a reference for ECoG signal amplification. One percutaneous pedestal connector (Blackrock Microsystems, Salt Lake City, UT) connected to the ECoG grids was surgically anchored to the skull.

Activity Detection

The channel average of the normalized high gamma signal of the 60 included electrodes across a 1-s integral window was calculated every 10 ms and used for detection of neural activity associated with speech (referred to as detection signal henceforth, Figure ). The peak from a 3-s rolling buffer of the detection signal was identified if its prominence^[^] was over two times the standard deviation of a 10-s rolling window of detection signals. A 2.5-s window (2 s before the peak and 0.5 s after the peak) of the processed neural activity was then used for the classification of the six speech commands (neural decoding model, Figure ). The same detection algorithm was also used for extracting training samples.

Neural Decoding Model

For the neural decoding model, a convolutional neural network (CNN) was designed using the InceptionTime architecture.^[^] Targeted toward time series classification, the InceptionTime model incorporated filters of variable length to gain access to hierarchical latent structures of different time resolutions. In the implementation of the CNN, Six Inception^[^] blocks were used, each with three Inception modules (Figure ), without neural network ensembling. Inside each Inception module, three sets of convolutions, each with 32 filters with kernel sizes ∈ {5, 11, 23}, were used after an initial convolution layer with 32 filters of kernel size = 1. A MaxPooling layer with kernel size = 3 and one subsequent set of convolutions with 32 filters of kernel size = 1 were also used within each module. The output of the four sets of convolutions was concatenated to form the output of each module. The final output from the last Inception block was used as input to a MaxPooling layer, followed by a fully connected layer that provided the final predicted classification score. The model in Python 3.8/3.9 was implemented using PyTorch v1.10. See Note (Supporting Information) for details on the decoding model.

Data Collection and Model Training

The overt speech command decoding model was trained on data collected between Day 77 and Day 120 post-implant. The silent speech decoding model was trained on data collected between Day 83 and Day 225 post-implant. The subject was instructed to read aloud or silently mime single text commands as each appeared on a computer monitor (Note , Supporting Information). There were in total 30 data collection experiments for overt speech conducted across 11 days, totaling 142.8 min and 300 trials for each command. Data collection experiments (43 in total) were conducted for silent speech across 17 days, totaling 266.6 min and 430 trials for each command. Decoding model optimization was performed with Adam optimizer.^[^] Model performances were evaluated with different hyperparameter choices by withholding an entire day's data as validation set.

Data Preprocessing and Real-Time System

ECoG signals were filtered (0.3–7500 Hz), amplified, and digitized using a NeuroPlexE headstage (Blackrock Microsystems, Salt Lake City, UT) attached to the 128-channel percutaneous connector of the implanted investigational device. The headstage was connected by HDMI cable to a Digital NeuroPort Biopotential Signal Processing System (NSP, Blackrock Microsystems, Salt Lake City, UT) where signals were downsampled to 1000 Hz. During real-time usage, data from the NSP was streamed from a ZeroMQ^[^] server implemented in the signal processing module of BCI2000.^[^] Subsequent real-time signal processing and model inference were implemented in Python within the ezmsg framework, a directed acyclic messaging pattern (), and deployed on a desktop computer with an Ethernet cable connection to the host computer for the Neuroport System. A ZeroMQ subscriber within this framework received and reformatted the data. Subsequently, data from the 60 channels used in this study were selected. High gamma signals were extracted using an 8th order Butterworth bandpass filter between 70 and 170 Hz. A notch filter between 118 and 122 Hz was applied to remove line noise. The logarithmic power of the high gamma signal for a 50-ms window was computed once 10-ms data was received. These processed HGE features were then stored in a decoding data buffer pending the detection of speech intent.

The channel average of HGE was stored in a separate 1-s circular buffer. Every 10 ms, the time average of the circular buffer was computed and stored in a detection buffer that was 3 s in duration. A 10-s baseline buffer was also updated the same way. Any local maximum in the detection window whose prominence was over two times the standard deviation of the detection window was defined as a detection. Peak detection was rejected if the difference between the signal peak and the minimum of the baseline buffer was below the standard deviation of the baseline buffer. Otherwise, the detected peak time was sent to the decoding data buffer.

Once the detected peak time was received by the decoding data buffer, another 0.5-s worth of data was collected. This was appended to the 2 s of data prior to peak detection. This 2.5-s window of data was then normalized against the mean and standard deviation of the pre-trial silent period (0.8 to 0 s prior to stimulus onset) collected during a syllable repetition task conducted 95 days after implant. Next, the fixed CNN decoder classified these neural signals into commands. Decoding results with raw scores at or above 0.55 were considered valid results and were sent back to the host machine. Results with raw scores below 0.55 were not registered and were treated as missed detections if a speech attempt was made.

In the real-time communication board control tasks, for ease of use, a command after enter would only be considered valid if it were back. In the case of any other commands being decoded, the application considered them to be back. In functional control tasks, this feature was not enabled as directional commands were still valid options after enter.

Online Experimental Task Design

In the real-time communication board control task, the subject was asked to freely choose his own target on a 4 × 8 communication board and navigate toward the target by issuing verbal commands. The application started with a red highlight over one of the icons on the communication board. This highlight turned green if the enter command was received, and would move rightward, leftward, upward, or downward if the right, left, up, or down command was received, respectively. The highlight turned yellow upon receiving the back command.

In the real-time functional control task, operation of the application started with the participant saying three enter commands in a row. The participant was asked to navigate in an application displayed on the screen directly in front of him. This application had an identical layout to the aforementioned communication board but provided different options. A separate screen displaying a streaming TV application was placed to the front left of the participant. The TV application was put into standby mode at the start of the experiment. Once the experiment started, the participant was instructed to first enter the functional control application. He was asked to then navigate to the correct icons by issuing up, down, left, or right commands. After the correct icon was selected, an enter command triggered functional control associated with the selected icon. If the TV application was selected, all further commands were then relayed to the TV application in lieu of the functional control application.

In the silent speech control task, a cross occurred on the screen for 2 s prompting the participant to give the next command. At the end of the 2 s, the cross was replaced by the communication board application. The participant was instructed to silently mime a freely chosen command immediately after the cross disappeared. Once the decoding result was received by the application, the highlight on the communication board was changed accordingly and displayed for a further 1.5 s until the next cross cue appeared.

Performance Evaluation

Real-time decoding results saved along with neural recordings were used to evaluate online performance. Simultaneous recordings of the participant's speech were transcribed as ground truth. Speech onset and offset times for each command were obtained using an energy-based voice activity detection model and verified during the manual transcription process.

The chance level was calculated for real-time usage without functional control by taking the average of n = 10000 simulations. In these chance-level simulations, a random number generator simulated a six-way random classification. If the simulated result was enter, the next result was corrected to back as the BCI system did in real-time usage. These simulated results were then compared with the ground-truth transcriptions for final chance level calculation. For functional control and offline analysis, the theoretical chance levels were used.

Signal Stability Analysis

For the correlation analysis, the average high gamma energy of all trials (−1.0 to 1.5 s relative to speech onset) was first computed during the training data collection phase for each command. For signal stability analysis, HEG was not normalized. For each real-time usage day, Pearson's correlation coefficient was calculated between each issued command that was decoded and their respective average training data. The average correlation of all trials was then computed for each channel. For the HGE analysis, the HGE was computed for each trial (−1.0–1.5 s) during every real-time usage day. The average across time for all trials for each stimulus was then computed.

Electrode Contribution Analysis

For electrode localization and cortical reconstruction, volumetric preoperative MRI and postoperative CT scans were used. They were then co-registered using Freesurfer.^[^] When determining electrode contributions to decoding, the gradient of the loss function was calculated with regard to the neural activity input per trial. Subsequently, the L1-norm of these gradients was calculated across time. Then, the average of this value was computed for all trials within each real-time usage month to get a final score of the electrode contribution.

Tuning Fork Control Experiments

To control for potential acoustic artifacts^[^] in experiments where overt speech was attempted by the participant, a tuning fork experiment previously reported by Wilson et al. was conducted.^[^] A 128-Hz tuning fork (fundamental frequency (F₀) of the participant's voice was ≈130 Hz) was held next to the participant or gently pressed against the participant's skull. In these tuning fork experiments, no increase in energy was observed near the 128-Hz tuning fork frequency in the ECoG channels used in this study (Figures and , Supporting Information).

Statistical Analysis

Performance data was either presented as raw values for each day, or in the case of ablation studies, box plots with the edges of the box representing quartiles. Data outside 1.5 times the interquartile range were shown as outliers. The maximum and minimum values of non-outliers were shown as box plot whiskers. Each box corresponds to the accuracy for n = 33 testing days for overt real-time experiments. Two-sided Mann–Whitney–Wilcoxon tests with Bonferroni correction were used to assess significance. All statistical tests were performed in Python using SciPy packages.

Acknowledgements

The authors wish to express their sincere gratitude to participant CC01, whose dedication and efforts made this study possible. The authors thank Kathryn Rosenblatt and Chad Gordon for medical and surgical support during device implantation; Alpa Uchil and the rest of the ALS clinic at the Johns Hopkins Hospital and the Johns Hopkins ALS Clinical Trials Unit for their consultations on ALS-related topics and care of the participant; Mathijs Raemaekers and Kevin Nathan for analysis of fMRI data; Yujing Wang for support in brain reconstruction and functional mapping. The authors were supported by the National Institutes of Health under award numbers UH3NS114439 (NINDS) and U01DC016686 (NIDCD).

Conflict of Interest

The authors declare no conflict of interest.

Author Contributions

S.L., M.A., and N.E.C. conceived and designed the experiments. S.L. designed and implemented the detection algorithm and the neural classifier. M.A., G.W.M., S.L., and C.C. implemented the real-time decoding pipeline. C.C., S.L., and S.S. designed the user interface. S.L., M.A., and D.N.C. contributed to data collection. S.L., M.A., D.N.C., and K.WS. analyzed the data. S.L., K.WS., and B.A.W. prepared data visualizations. S.L., M.A., C.C., D.N.C., S.S., Q.R., H.H., M.S.F., and N.E.C. contributed to the study methodology. D.C.T. performed speech and language assessment. A.R.W., N.J.M., L.L.C., M.J.V., F.V.T., N.F.R. M.S.F., and N.E.C. contributed to patient recruitment and regulatory approval. W.S.A. and N.E.C. planned and performed device implantation; S.L., D.N.C., B.A.W., F.V.T., N.F.R., and M.S.F. were also involved in surgical planning and intraoperative functional mapping. H.H. supervised signal processing. N.F.R. and N.E.C. contributed to funding acquisition. N.E.C. supervised the study and the conceptualization. S.L., M.S.F., and N.E.C. prepared the manuscript. All authors reviewed and revised the manuscript.

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

References

J. S. Brumberg, A. Nieto‐Castanon, P. R. Kennedy, F. H. Guenther, Speech Commun. 2010, 52, 367.

J. R. Wolpaw, N. Birbaumer, D. J. Mcfarland, G. Pfurtscheller, T. M. Vaughan, Clin. Neurophysiol. 2002, 113, 767.

M. J. Vansteensel, E. G. M. Pels, M. G. Bleichner, M. P. Branco, T. Denison, Z. V. Freudenburg, P. Gosselaar, S. Leinders, T. H. Ottens, M. A. Van Den Boom, P. C. Van Rijen, E. J. Aarnoutse, N. F. Ramsey, N. Engl. J. Med. 2016, 375, 2060.

C. Pandarinath, P. Nuyujukian, C. H. Blabe, B. L. Sorice, J. Saab, F. R. Willett, L. R. Hochberg, K. V. Shenoy, J. M. Henderson, eLife 2017, 6, [eLocator: e18554].

A. Greenberg, A. Cohen, M. Grewal, Nat. Biotechnol. 2021, 39, 1194.

G. K. Anumanchipalli, J. Chartier, E. F. Chang, Nature 2019, 568, 493.

D. A. Moses, S. L. Metzger, J. R. Liu, G. K. Anumanchipalli, J. G. Makin, P. F. Sun, J. Chartier, M. E. Dougherty, P. M. Liu, G. M. Abrams, A. Tu‐Chan, K. Ganguly, E. F. Chang, N. Engl. J. Med. 2021, 385, 217.

M. Angrick, C. Herff, E. Mugler, M. C. Tate, M. W. Slutzky, D. J. Krusienski, T. Schultz, J. Neural Eng. 2019, 16, [eLocator: 036019].

C. Herff, L. Diener, M. Angrick, E. Mugler, M. C. Tate, M. A. Goldrick, D. J. Krusienski, M. W. Slutzky, T. Schultz, Front. Neurosci. 2019, 13, 1267.

E. M. Mugler, J. L. Patton, R. D. Flint, Z. A. Wright, S. U. Schuele, J. Rosenow, J. J. Shih, D. J. Krusienski, M. W. Slutzky, J. Neural Eng. 2014, 11, [eLocator: 035015].

C. Herff, D. Heger, A. De Pesters, D. Telaar, P. Brunner, G. Schalk, T. Schultz, Front. Neurosci. 2015, 8, 217.

J. G. Makin, D. A. Moses, E. F. Chang, Nat. Neurosci. 2020, 23, 575.

F. H. Guenther, J. S. Brumberg, E. J. Wright, A. Nieto‐Castanon, J. A. Tourville, M. Panko, R. Law, S. A. Siebert, J. L. Bartels, D. S. Andreasen, P. Ehirim, H. Mao, P. R. Kennedy, PLoS One 2009, 4, [eLocator: e8218].

T. Proix, J. Delgado Saa, A. Christen, S. Martin, B. N. Pasley, R. T. Knight, X. Tian, D. Poeppel, W. K. Doyle, O. Devinsky, L. H. Arnal, P. Mégevand, A.‐L. Giraud, Nat. Commun. 2022, 13, 48.

M. P. Branco, E. G. M. Pels, R. H. Sars, E. J. Aarnoutse, N. F. Ramsey, M. J. Vansteensel, F. Nijboer, Neurorehabilit. Neural Repair 2021, 35, 267.

F. R. Willett, D. T. Avansino, L. R. Hochberg, J. M. Henderson, K. V. Shenoy, Nature 2021, 593, 249.

A. L. Benabid, T. Costecalde, A. Eliseyev, G. Charvet, A. Verney, S. Karakas, M. Foerster, A. Lambert, B. Morinière, N. Abroug, M.‐C. Schaeffer, A. Moly, F. Sauter‐Starace, D. Ratel, C. Moro, N. Torres‐Martinez, L. Langar, M. Oddoux, M. Polosan, S. Pezzani, V. Auboiroux, T. Aksenova, C. Mestais, S. Chabardes, Lancet Neurol. 2019, 18, 1112.

D. M. Brandman, T. Hosman, J. Saab, M. C. Burkhart, B. E. Shanahan, J. G. Ciancibello, A. A. Sarma, D. J. Milstein, C. E. Vargas‐Irwin, B. Franco, J. Kelemen, C. Blabe, B. A. Murphy, D. R. Young, F. R. Willett, C. Pandarinath, S. D. Stavisky, R. F. Kirsch, B. L. Walter, A. Bolu Ajiboye, S. S. Cash, E. N. Eskandar, J. P. Miller, J. A. Sweet, K. V. Shenoy, J. M. Henderson, B. Jarosiewicz, M. T. Harrison, J. D. Simeral, L. R. Hochberg, J. Neural Eng. 2018, 15, [eLocator: 026007].

D. Sussillo, S. D. Stavisky, J. C. Kao, S. I. Ryu, K. V. Shenoy, Nat. Commun. 2016, 7, [eLocator: 13749].

D. B. Silversmith, R. Abiri, N. F. Hardy, N. Natraj, A. Tu‐Chan, E. F. Chang, K. Ganguly, Nat. Biotechnol. 2021, 39, 326.

N. Crone, D. Miglioretti, B. Gordon, R. Lesser, Brain 1998, 121, 2301.

S. Ray, N. E. Crone, E. Niebur, P. J. Franaszczuk, S. S. Hsiao, J. Neurosci. 2008, 28, [eLocator: 11526].

G. Buzsáki, C. A. Anastassiou, C. Koch, Nat. Rev. Neurosci. 2012, 13, 407.

D. A. Moses, M. K. Leonard, J. G. Makin, E. F. Chang, Nat. Commun. 2019, 10, 3096.

H. I. Fawaz, B. Lucas, G. Forestier, C. Pelletier, D. F. Schmidt, J. Weber, G. I. Webb, L. Idoumghar, P.‐A. Muller, F. Petitjean, Data Min. Knowl. Discov. 2020, 34, 1936.

K. Simonyan, A. Vedaldi, A. Zisserman, presented at Proc. of the Int. Conf. on Learning Representations (ICLR), Banff, Canada, April 2014.

K. E. Bouchard, N. Mesgarani, K. Johnson, E. F. Chang, Nature 2013, 495, 327.

S. L. Metzger, J. R. Liu, D. A. Moses, M. E. Dougherty, M. P. Seaton, K. T. Littlejohn, J. Chartier, G. K. Anumanchipalli, A. Tu‐Chan, K. Ganguly, E. F. Chang, Nat. Commun. 2022, 13, 6510.

J. E. Huggins, P. A. Wren, K. L. Gruis, Amyotroph. Lateral Sc. 2011, 12, 318.

A. D. Degenhart, W. E. Bishop, E. R. Oby, E. C. Tyler‐Kabara, S. M. Chase, A. P. Batista, B. M. Yu, Nat. Biomed. Eng. 2020, 4, 672.

K. T. Jensen, N. Kadmon Harpaz, A. K. Dhawale, S. B. E. Wolff, B. P. Ölveczky, Nat. Neurosci. 2022, 25, 1664.

J. A. Gallego, M. G. Perich, R. H. Chowdhury, S. A. Solla, L. E. Miller, Nat. Neurosci. 2020, 23, 260.

A. Moly, T. Costecalde, F. Martel, M. Martin, C. Larzabal, S. Karakas, A. Verney, G. Charvet, S. Chabardes, A. L. Benabid, T. Aksenova, J. Neural Eng. 2022, 19, [eLocator: 026021].

V. R. Rao, M. K. Leonard, J. K. Kleen, B. A. Lucas, E. A. Mirro, E. F. Chang, NeuroImage 2017, 153, 273.

H. Lorach, A. Galvez, V. Spagnolo, F. Martel, S. Karakas, N. Intering, M. Vat, O. Faivre, C. Harte, S. Komi, J. Ravier, T. Collin, L. Coquoz, I. Sakr, E. Baaklini, S. D. Hernandez‐Charpak, G. Dumont, R. Buschman, N. Buse, T. Denison, I. van Nes, L. Asboth, A. Watrin, L. Struber, F. Sauter‐Starace, L. Langar, V. Auboiroux, S. Carda, S. Chabardes, T. Aksenova, et al., Nature 2023, 618, 126.

Z. Chao, Y. Nagasaka, N. Fujii, Front. Neuroeng. 2010, 3, 3.

A. D. Degenhart, J. Eles, R. Dum, J. L. Mischel, I. Smalianchuk, B. Endler, R. C. Ashmore, E. C. Tyler‐Kabara, N. G. Hatsopoulos, W. Wang, A. P. Batista, X. T. Cui, J. Neural Eng. 2016, 13, [eLocator: 046019].

G. Milsap, M. Collard, C. Coogan, Q. Rabbani, Y. Wang, N. E. Crone, Front. Neurosci. 2019, 13, 60.

B. Kashyap, M. Horne, P. N. Pathirana, L. Power, D. Szmulewicz, Biomed. Signal Process. Control 2020, 57, [eLocator: 101759].

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, presented at IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, USA, June 2015.

D. P. Kingma, J. Ba, arXiv:1412.6980 2017, [DOI: https://dx.doi.org/10.48550/arXiv.1412.6980].

P. Hintjens, O'Reilly Media, Inc., Sebastopol, CA, 2013.

G. Schalk, D. J. Mcfarland, T. Hinterberger, N. Birbaumer, J. R. Wolpaw, IEEE Trans. Biomed. Eng. 2004, 51, 1034.

B. Fischl, NeuroImage 2012, 62, 774.

P. Roussel, G. L. Godais, F. Bocquelet, M. Palma, J. Hongjie, S. Zhang, A.‐L. Giraud, P. Mégevand, K. Miller, J. Gehrig, C. Kell, P. Kahane, S. Chabardés, B. Yvert, J. Neural Eng. 2020, 17, [eLocator: 056028].

A. Bush, A. Chrabaszcz, V. Peterson, V. Saravanan, C. Dastolfo‐Hromack, W. J. Lipski, R. M. Richardson, NeuroImage 2022, 250, [eLocator: 118962].

G. H. Wilson, S. D. Stavisky, F. R. Willett, D. T. Avansino, J. N. Kelemen, L. R. Hochberg, J. M. Henderson, S. Druckmann, K. V. Shenoy, J. Neural Eng. 2020, 17, [eLocator: 066007].

Word count: 7532

Show less

© 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Brain‐computer interfaces (BCIs) can be used to control assistive devices by patients with neurological disorders like amyotrophic lateral sclerosis (ALS) that limit speech and movement. For assistive control, it is desirable for BCI systems to be accurate and reliable, preferably with minimal setup time. In this study, a participant with severe dysarthria due to ALS operates computer applications with six intuitive speech commands via a chronic electrocorticographic (ECoG) implant over the ventral sensorimotor cortex. Speech commands are accurately detected and decoded (median accuracy: 90.59%) throughout a 3‐month study period without model retraining or recalibration. Use of the BCI does not require exogenous timing cues, enabling the participant to issue self‐paced commands at will. These results demonstrate that a chronically implanted ECoG‐based speech BCI can reliably control assistive devices over long time periods with only initial model training and calibration, supporting the feasibility of unassisted home use.

Details

Title

Stable Decoding from a Speech BCI Enables Control for an Individual with ALS without Recalibration for 3 Months

Author

Luo, Shiyu¹

; Angrick, Miguel²

; Coogan, Christopher²

; Candrea, Daniel N.¹; Wyse‐Sookoo, Kimberley¹

; Shah, Samyak²

; Rabbani, Qinwan³

; Milsap, Griffin W.⁴

; Weiss, Alexander R.²

; Anderson, William S.⁵; Tippett, Donna C.⁶

; Maragakis, Nicholas J.²

; Clawson, Lora L.²; Vansteensel, Mariska J.⁷

; Wester, Brock A.⁴

; Tenore, Francesco V.⁴; Hermansky, Hynek³; Fifer, Matthew S.⁴

; Ramsey, Nick F.⁷

; Crone, Nathan E.²

¹ Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD, USA
² Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
³ Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, USA
⁴ Research and Exploratory Development Department, Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA
⁵ Department of Neurosurgery, Johns Hopkins University School of Medicine, Baltimore, MD, USA
⁶ Department of Physical Medicine and Rehabilitation, Johns Hopkins University School of Medicine, Baltimore, MD, USA
⁷ Department of Neurology and Neurosurgery, UMC Utrecht Brain Center, Utrecht, The Netherlands

Section

Research Articles

Publication year

2023

Publication date

Dec 1, 2023

Publisher

John Wiley & Sons, Inc.

e-ISSN

21983844

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1002/advs.202304853

ProQuest document ID

2902147634

Stable Decoding from a Speech BCI Enables Control for an Individual with ALS without Recalibration for 3 Months

Jump to:

Full text

Abstract

Details

Suggested sources