Abstract

Music identification via audio fingerprinting has been an active research field in recent years. In the real-world environment, music queries are often deformed by various interferences which typically include signal distortions and time-frequency misalignments caused by time stretching, pitch shifting, etc. Therefore, robustness plays a crucial role in music identification technique. In this paper, we propose to use scale invariant feature transform (SIFT) local descriptors computed from a spectrogram image as sub-fingerprints for music identification. Experiments show that these sub-fingerprints exhibit strong robustness against serious time stretching and pitch shifting simultaneously. In addition, a locality sensitive hashing (LSH)-based nearest sub-fingerprint retrieval method and a matching determination mechanism are applied for robust sub-fingerprint matching, which makes the identification efficient and precise. Finally, as an auxiliary function, we demonstrate that by comparing the time-frequency locations of corresponding SIFT keypoints, the factor of time stretching and pitch shifting that music queries might have experienced can be accurately estimated.

Details

Title
SIFT-based local spectrogram image descriptor: a novel feature for robust music identification
Author
Zhang, Xiu; Zhu, Bilei; Li, Linwei; Li, Wei; Li, Xiaoqiang; Wang, Wei; Lu, Peizhong; Zhang, Wenqiang
Pages
1-15
Publication year
2015
Publication date
Feb 2015
Publisher
Springer Nature B.V.
ISSN
16874714
e-ISSN
16874722
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
1657513181
Copyright
The Author(s) 2015