Abstract

Translate

This paper addresses the challenge of online blind speaker separation in a multi-microphone setting. The linearly constrained minimum variance (LCMV) beamformer is selected as the backbone of the separation algorithm due to its distortionless response and capacity to create a null towards interfering sources. A specific instance of the LCMV beamformer that considers acoustic propagation is implemented. In this variant, the relative transfer functions (RTFs) associated with each speaker of interest are utilized as the steering vectors of the beamformer. A control mechanism is devised to ensure robust estimation of the beamformer’s building blocks, comprising speaker activity detectors and direction of arrival (DOA) estimation branches. This control mechanism is implemented as a multi-task deep neural network (DNN). The primary task classifies each time frame based on speaker activity: no active speaker, single active speaker, or multiple active speakers. The secondary task is DOA estimation. It is implemented as a classification task, executed only for frames classified as single-speaker frames by the primary branch. The direction of the active speaker is classified into one of the multiple ranges of angles. These frames are also leveraged to estimate the RTFs using subspace estimation methods. A library of RTFs associated with these DOA ranges is then constructed, facilitating rapid acquisition of new speakers and efficient tracking of existing speakers. The proposed scheme is evaluated in both simulated and real-life recordings, encompassing static and dynamic scenarios. The benefits of the multi-task approach are showcased, and significant improvements are evident, even when the control mechanism is trained with simulated data and tested with real-life data. A comparison between the proposed scheme and the independent low-rank matrix analysis (ILRMA) algorithm reveals significant improvements in static scenarios. Furthermore, the tracking capabilities of the proposed scheme are highlighted in dynamic scenarios.

Details

Title

Multi-microphone simultaneous speakers detection and localization of multi-sources for separation and noise reduction

Author

Schwartz, Ayal¹; Schwartz, Ofer²; Chazan, Shlomo E.¹; Gannot, Sharon²

¹ Faculty of Engineering, Bar Ilan University, Ramat-Gan, Israel (GRID:grid.22098.31) (ISNI:0000 0004 1937 0503); Origin.AI, Ramat-Gan, Israel (GRID:grid.22098.31)
² Faculty of Engineering, Bar Ilan University, Ramat-Gan, Israel (GRID:grid.22098.31) (ISNI:0000 0004 1937 0503)

Pages

Publication year

2024

Publication date

Dec 2024

Publisher

Springer Nature B.V.

ISSN

16874714

e-ISSN

16874722

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1186/s13636-024-00365-3

ProQuest document ID

3112971680

© The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Multi-microphone simultaneous speakers detection and localization of multi-sources for separation and noise reduction

Jump to:

Abstract

Details

Suggested sources