A novel fusion architecture for detecting

Full text

Introduction

The diagnosis of Parkinson’s Disease (PD) is traditionally reliant on the clinical assessments focused on the motor symptoms of the individuals¹. Traditional methods, while effective, often miss the subtle early symptoms of the disease, leading to delayed interventions². The situation is further exacerbated by the limited accessibility to specialized neurological healthcare, particularly in regions with significantly lower ratios of neurologists to the population. For instance, Bangladesh had only 86 neurologists for over 140 million people in 2014³, while some African nations had one neurologist per three million people, with 21 countries having fewer than five neurologists each⁴. Given the expected doubling of PD cases by 2030⁵, there’s a pressing need for accessible, home-based diagnostic solutions to address global disparities in healthcare access.

Recent advancements have seen a shift towards integrating digital biomarkers to develop automated AI based at-home PD detection and progression tracking tools^{6, 7, 8, 9–10}. Techniques vary from sensor-based nocturnal breathing signal⁹ and accelerometric data collection¹¹, to digital analysis of facial expressions¹². However, wearables and sensors may be inconvenient for the elderly^13,14, and posed expressions can miss subtle diagnostic cues.

Alternatively, speech analysis offers a non-invasive route, leveraging natural speech patterns for PD detection. Traditional speech analysis in PD—primarily relied on sustained phonation tasks^{6,15, 16, 17, 18, 19, 20–21}—although useful, does not reflect the complexities of natural speech. To counter that, studies have been proposed to use continuous speech to develop PD classifiers using varying technologies such as CNNs²², time-frequency analysis²³, or SVMs²⁴. However, these studies rely on fixed recording setups and small sample sizes, which limits the generalizability of the models and fail to adequately address accessibility concerns. Even with larger datasets, models like CNNs and SVMs face structural limitations. CNNs, while powerful for feature extraction, are primarily designed for spatial data, and their application to time-series or speech data can be challenging unless properly adapted^25,26. They may require deep architectures to capture complex temporal dependencies in PD speech data, increasing the risk of overfitting if the model is not properly regularized or if the dataset lacks sufficient variability to cover real-world...

Show less

A novel fusion architecture for detecting Parkinson’s Disease using semi-supervised speech embeddings

Full text

Suggested sources

A novel fusion architecture for detecting Parkinson’s Disease using semi-supervised speech embeddings

Content area

Full text

Suggested sources