Full text

Turn on search term navigation

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Esophageal speech (ES) is a pathological voice that is often difficult to understand. Moreover, acquiring recordings of a patient’s voice before a laryngectomy proves challenging, thereby complicating enhancing this kind of voice. That is why most supervised methods used to enhance ES are based on voice conversion, which uses healthy speaker targets, things that may not preserve the speaker’s identity. Otherwise, unsupervised methods for ES are mostly based on traditional filters, which cannot alone beat this kind of noise, making the denoising process difficult. Also, these methods are known for producing musical artifacts. To address these issues, a self-supervised method based on the Only-Noisy-Training (ONT) model was applied, consisting of denoising a signal without needing a clean target. Four experiments were conducted using Deep Complex UNET (DCUNET) and Deep Complex UNET with Complex Two-Stage Transformer Module (DCUNET-cTSTM) for assessment. Both of these models are based on the ONT approach. Also, for comparison purposes and to calculate the evaluation metrics, the pre-trained VoiceFixer model was used to restore the clean wave files of esophageal speech. Even with the fact that ONT-based methods work better with noisy wave files, the results have proven that ES can be denoised without the need for clean targets, and hence, the speaker’s identity is retained.

Details

Title
Assessment of Self-Supervised Denoising Methods for Esophageal Speech Enhancement
Author
Amarjouf, Madiha 1   VIAFID ORCID Logo  ; El Hassan Ibn Elhaj 1 ; Chami, Mouhcine 2 ; Ezzine, Kadria 3   VIAFID ORCID Logo  ; Joseph Di Martino 3 

 Research Laboratory in Telecommunications Systems: Networks and Services (STRS), Research Team: Multimedia, Signal and Communications Systems (MUSICS), National Institute of Posts and Telecommunications (INPT), Av. Allal Al Fassi, Rabat 10112, Morocco; [email protected] 
 Research Laboratory in Telecommunications Systems: Networks and Services (STRS), Research Team: Secure and Mixed Architecture for Reliable Technologies and Systems (SMARTS), National Institute of Posts and Telecommunications (INPT), Av. Allal Al Fassi, Rabat 10112, Morocco; [email protected] 
 LORIA-Laboratoire Lorrain de Recherche en Informatique et ses Applications, B.P. 239, 54506 Vandœuvre-lès-Nancy, France; [email protected] (K.E.); [email protected] (J.D.M.) 
First page
6682
Publication year
2024
Publication date
2024
Publisher
MDPI AG
e-ISSN
20763417
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3090893335
Copyright
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.