Channel and temporal-frequency attention UNet for

Abstract

The presence of noise and reverberation significantly impedes speech clarity and intelligibility. To mitigate these effects, numerous deep learning-based network models have been proposed for speech enhancement tasks aimed at improving speech quality. In this study, we propose a monaural speech enhancement model called the channel and temporal-frequency attention UNet (CTFUNet). CTFUNet takes the noisy spectrum as input and produces a complex ideal ratio mask (cIRM) as output. To improve the speech enhancement performance of CTFUNet, we employ multi-scale temporal-frequency processing to extract input speech spectrum features. We also utilize multi-conv head channel attention and residual channel attention to capture temporal-frequency and channel features. Moreover, we introduce the channel temporal-frequency skip connection to alleviate information loss between down-sampling and up-sampling. On the blind test set of the first deep noise suppression challenge, our proposed CTFUNet has better denoising performance than the champion models and the latest models. Furthermore, our model outperforms recent models such as Uformar and MTFAA in both denoising and dereverberation performance.

Details

Title

Channel and temporal-frequency attention UNet for monaural speech enhancement

Author

Xu, Shiyun¹; Zhang, Zehua¹; Wang, Mingjiang¹

¹ Harbin Institute of Technology, Key Laboratory for Key Technologies of IoT Terminals, Shenzhen, China (GRID:grid.19373.3f) (ISNI:0000 0001 0193 3564)

Pages

Publication year

2023

Publication date

Dec 2023

Publisher

Springer Nature B.V.

ISSN

16874714

e-ISSN

16874722

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1186/s13636-023-00295-6

ProQuest document ID

2850407631

© The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Channel and temporal-frequency attention UNet for monaural speech enhancement

Jump to:

Abstract

Details

Full text options

Suggested sources