Full text

Turn on search term navigation

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Patients suffering from Parkinson’s disease suffer from voice impairment. In this study, we introduce models to classify normal and Parkinson’s patients using their speech. We used an AST (audio spectrogram transformer), a transformer-based speech classification model that has recently outperformed CNN-based models in many fields, and a CNN-based PSLA (pretraining, sampling, labeling, and aggregation), a high-performance model in the existing speech classification field, for the study. This study compares and analyzes the models from both quantitative and qualitative perspectives. First, qualitatively, PSLA outperformed AST by more than 4% in accuracy, and the AUC was also higher, with 94.16% for AST and 97.43% for PSLA. Furthermore, we qualitatively evaluated the ability of the models to capture the acoustic features of Parkinson’s through various CAM (class activation map)-based XAI (eXplainable AI) models such as GradCAM and EigenCAM. Based on PSLA, we found that the model focuses well on the muffled frequency band of Parkinson’s speech, and the heatmap analysis of false positives and false negatives shows that the speech features are also visually represented when the model actually makes incorrect predictions. The contribution of this paper is that we not only found a suitable model for diagnosing Parkinson’s through speech using two different types of models but also validated the predictions of the model in practice.

Details

Title
Exploring Spectrogram-Based Audio Classification for Parkinson’s Disease: A Study on Speech Classification and Qualitative Reliability Verification
Author
Seung-Min, Jeong 1   VIAFID ORCID Logo  ; Kim, Seunghyun 1   VIAFID ORCID Logo  ; Lee, Eui Chul 2   VIAFID ORCID Logo  ; Kim, Han Joon 3   VIAFID ORCID Logo 

 Department of AI & Informatics, Graduate School, Sangmyung University, Hongjimun 2-gil 20, Jongno-gu, Seoul 03016, Republic of Korea; [email protected] (S.-M.J.); [email protected] (S.K.) 
 Department of Human-Centered Artificial Intelligence, Sangmyung University, Hongjimun 2-gil 20, Jongno-gu, Seoul 03016, Republic of Korea 
 Department of Neurology, Seoul National University College of Medicine, Seoul National University Hospital, Daehak-ro 101, Jongno-gu, Seoul 03080, Republic of Korea 
First page
4625
Publication year
2024
Publication date
2024
Publisher
MDPI AG
e-ISSN
14248220
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3085061881
Copyright
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.