Content area
Full text
Introduction
Identifying the location of sources from their received signal in an array consisting of several sensors is an important problem in signal processing which arises in many applications such as target detection in radar1, user tracking in wireless systems2, indoor presence detection3, virtual reality, consumer audio, etc. Localization is a well-known classical problem and has been widely studied in the literature.
A commonly-used method to estimate the location or the direction of arrival (DoA) of the source from the received signal in the array is to apply reverse beamforming to the incident signals. Reverse beamforming combines the received array signals in the time or frequency domains according to a signal propagation model, to “steer” the array towards a putative target. The true DoA of an audio source can be estimated by finding the input direction corresponding to the highest received power in the steered microphone array. Commonly-used super-resolution methods adopt reverse beamforming for DoA estimation, such as MUSIC4 and ESPRIT5. Besides source localization, beamforming in its various forms appears as the first stage of spatial signal processing in applications such as audio source separation in the cocktail party problem6,7 and spatial user grouping in wireless communication8.
Conventional beamforming approaches assume that incident signals are far-field narrowband sinusoids, and use knowledge of the microphone array geometry to specify phase shifts between the several microphone inputs and effectively steer the array towards a particular direction9. Obviously, most audio signals are not narrowband, with potentially unknown spectral characteristics, meaning that phase shifts cannot be analytically derived. The conventional solution is to decompose incoming signals into narrowband components via a dense filterbank or Fourier transform approach, and then apply narrowband beamforming separately in each band. The accuracy of these conventional approaches relies on a large number of frequency bands, which increases the implementation complexity and resource requirements proportionally.
Auditory source localization forms a crucial part of biological signal processing for vertebrates, and plays a vital role in an animal’s perception of 3D space10,11. Neuroscientific studies indicate that the auditory perception of space occurs through inter-aural time- and level-differences, with an angular resolution depending on the wavelength of the incoming signal12.




