Abstract

Binaural audio technology has been in existence for many years. However, its popularity has significantly increased over the past decade as a consequence of advancements in virtual reality and streaming techniques. Along with its growing popularity, the quantity of publicly accessible binaural audio recordings has also expanded. Consequently, there is now a need for automated and objective retrieval of spatial content information, with ensemble location and width being the most prominent. This study presents a novel method for estimating these ensemble parameters in binaural recordings of music. For this purpose, a dataset of 23 040 binaural recordings was synthesized from 192 publicly-available music recordings using 30 head-related transfer functions. The synthesized excerpts were then used to train a multi-task spectrogram-based convolutional neural network model, aiming to estimate the ensemble location and width for unseen recordings. The results indicate that a model for estimating ensemble parameters can be successfully constructed with low prediction errors: 4.76∘ (±0.10∘) for ensemble location and 8.57∘ (±0.19∘) for ensemble width. The method developed in this study outperforms previous spatiogram-based techniques recently published in the literature and shows promise for future development as part of a novel tool for binaural audio recordings analysis.

Details

Title
Estimating Ensemble Location and Width in Binaural Recordings of Music with Convolutional Neural Networks
Author
Antoniuk, Paweł; Zieliński, Sławomir K
Pages
81–93
Publication year
2025
Publication date
2025
Publisher
Polish Academy of Sciences
ISSN
01375075
e-ISSN
2300-262X
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3180922469
Copyright
© 2025. This work is licensed under https://creativecommons.org/licenses/by-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.