Abstract

This paper proposes a multichannel environmental sound segmentation method. Environmental sound segmentation is an integrated method to achieve sound source localization, sound source separation and classification, simultaneously. When multiple microphones are available, spatial features can be used to improve the localization and separation accuracy of sounds from different directions; however, conventional methods have three drawbacks: (a) Sound source localization and sound source separation methods using spatial features and classification using spectral features trained in the same neural network, may overfit to the relationship between the direction of arrival and the class of a sound, thereby reducing their reliability to deal with novel events. (b) Although permutation invariant training used in autonomous speech recognition could be extended, it is impractical for environmental sounds that include an unlimited number of sound sources. (c) Various features, such as complex values of short time Fourier transform and interchannel phase differences have been used as spatial features, but no study has compared them. This paper proposes a multichannel environmental sound segmentation method comprising two discrete blocks, a sound source localization and separation block and a sound source separation and classification block. By separating the blocks, overfitting to the relationship between the direction of arrival and the class is avoided. Simulation experiments using created datasets including 75-class environmental sounds showed the root mean squared error of the proposed method was lower than that of conventional methods.

Details

Title
Multichannel environmental sound segmentation
Author
Sudo Yui 1   VIAFID ORCID Logo  ; Itoyama Katsutoshi 1 ; Nishida Kenji 1 ; Nakadai Kazuhiro 2 

 Tokyo Institute of Technology, Department of Systems and Control Engineering, School of Engineering, Meguro-ku, Japan (GRID:grid.32197.3e) (ISNI:0000 0001 2179 2105) 
 Tokyo Institute of Technology, Department of Systems and Control Engineering, School of Engineering, Meguro-ku, Japan (GRID:grid.32197.3e) (ISNI:0000 0001 2179 2105); Honda Research Institute Japan Co., Ltd., Saitama, Japan (GRID:grid.471052.5) (ISNI:0000 0004 1763 7120) 
Pages
8245-8259
Publication year
2021
Publication date
Nov 2021
Publisher
Springer Nature B.V.
ISSN
0924669X
e-ISSN
1573-7497
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2579465122
Copyright
© The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.