Multichannel environmental sound segmentation

Abstract

This paper proposes a multichannel environmental sound segmentation method. Environmental sound segmentation is an integrated method to achieve sound source localization, sound source separation and classification, simultaneously. When multiple microphones are available, spatial features can be used to improve the localization and separation accuracy of sounds from different directions; however, conventional methods have three drawbacks: (a) Sound source localization and sound source separation methods using spatial features and classification using spectral features trained in the same neural network, may overfit to the relationship between the direction of arrival and the class of a sound, thereby reducing their reliability to deal with novel events. (b) Although permutation invariant training used in autonomous speech recognition could be extended, it is impractical for environmental sounds that include an unlimited number of sound sources. (c) Various features, such as complex values of short time Fourier transform and interchannel phase differences have been used as spatial features, but no study has compared them. This paper proposes a multichannel environmental sound segmentation method comprising two discrete blocks, a sound source localization and separation block and a sound source separation and classification block. By separating the blocks, overfitting to the relationship between the direction of arrival and the class is avoided. Simulation experiments using created datasets including 75-class environmental sounds showed the root mean squared error of the proposed method was lower than that of conventional methods.

Details

Title

Multichannel environmental sound segmentation

Author

Sudo Yui¹

; Itoyama Katsutoshi¹; Nishida Kenji¹; Nakadai Kazuhiro²

¹ Tokyo Institute of Technology, Department of Systems and Control Engineering, School of Engineering, Meguro-ku, Japan (GRID:grid.32197.3e) (ISNI:0000 0001 2179 2105)
² Tokyo Institute of Technology, Department of Systems and Control Engineering, School of Engineering, Meguro-ku, Japan (GRID:grid.32197.3e) (ISNI:0000 0001 2179 2105); Honda Research Institute Japan Co., Ltd., Saitama, Japan (GRID:grid.471052.5) (ISNI:0000 0004 1763 7120)

Pages

8245-8259

Publication year

2021

Publication date

Nov 2021

Publisher

Springer Nature B.V.

ISSN

0924669X

e-ISSN

1573-7497

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1007/s10489-021-02314-5

ProQuest document ID

2579465122

© The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Multichannel environmental sound segmentation

Jump to:

Abstract

Details

Full text options

Suggested sources