This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
1. Introduction
The increasing number of smart mobile devices (such as laptops and smartphones) that can interact with their surrounding world through their many sensors enables one to envision several new and exciting applications. Examples of such applications include indoor navigation [1], location-based services [2], patient monitoring [3], and many others involving ad hoc microphone arrays [4–11]. These applications have one thing in common: they require the estimation of the mobile device position.
Based on the specific technology employed, it is possible to categorize methodologies for localization of mobile devices in three distinct groups [12]: (i) systems that utilize ancillary sensors, such as accelerometer and magnetometers; (ii) systems that rely on the received signal strength of radio signals; and (iii) systems that use time-of-flight (TOF) of acoustic signals to perform localization. This paper focuses on (iii), since acoustic sensor localization (ASL) systems enable centimeter-scale localization and are quite inexpensive as they only require acoustic sensors (i.e., microphones) in the mobile devices.
ASL techniques relying on TOF information may require or not prior knowledge of the loudspeakers’ positions. For instance, many works have tackled the problem of microphone array calibration [13–16] assuming that sources’ positions are known and aiming to estimate (or refine an initial estimate of) the acoustic sensor location. In such scenarios, synchronization problems are not a major concern, since all sources and sinks are usually under the control of whoever wants to calibrate the microphone array. On the other hand, if the sources’ positions are unknown, then a joint source and sensor localization is commonly employed [17, 18]. In this case, the lack of synchronism may severely degrade the mobile position estimate. In [7], a solution that deals explicitly with synchronization was presented. In [8], a solution considering multiple sources and sensors per device was described.
When it comes to the performance of ASL techniques, the accuracy of TOF estimates is of paramount importance. Roughly speaking, in order to obtain an accurate TOF estimate, the sensor (microphone) should be able to determine the exact time a given acoustic signal takes to propagate from the source. Hence, the acoustic signals employed, herein called probe signals, should be carefully selected, taking into consideration practical issues like noise, interference, and mainly reverberation. Recently, several papers have addressed the problem of designing good probe signals and improving the TOF estimation. In [19], a probe signal design using pulse compression technique and hyperbolic frequency modulated signals was presented. The technique proposed in that work is able to localize an acoustic source and estimate its velocity and direction in case it is moving. In addition, a matching pursuit-based algorithm for TOF estimation was described in [20] and then refined in [12], in both papers with promising results. In [21], an iterative peak matching algorithm for the calibration of a wireless acoustic sensor network was described.
The ASL methods described in this work assume that sources’ positions and probe signals are known at the receiver (i.e., the mobile device containing a microphone), but not counting on synchronism between sensor and source nor cooperation with other mobile devices: all processing must take place on the sensor itself, allowing the sensor to self-localize indoors. Nevertheless, a robust ASL system with such characteristics needs to address several issues, such as reverberation, asynchrony between acoustic source and sensor, poor estimation of the speed of sound, and noise. Also, the proposed solutions need to be computationally inexpensive in order to be able to run on the mobile itself (which in general is a battery-operated device).
1.1. Objectives
The main objectives of this paper are
(i)
to review some state-of-the-art techniques for ASL;
(ii)
to describe in detail the challenges ASL algorithms face in practical environments;
(iii)
to present in a unified context some recent methods proposed by the authors to tackle such practical challenges;
(iv)
to propose a novel ASL algorithm that combines different state-of-the-art techniques with a new region-based search.
For beginners in the ASL field it can be a Herculean task to go through the myriad of techniques addressing ASL problems. The concise approach adopted here provides beginners with a glimpse of the main problems and solutions in the ASL field. Experienced researchers can also benefit from the unified approach covering recent ideas that are not standard in the area, as the ones described in [22–24]. In fact, many research opportunities may open up thanks to the unified way ASL problems and related solutions are described here. Finally, low power high-frequency probe signals inaudible for most people are proposed, along with a new region-based search that is shown to be robust when combined with matching pursuit TOF-selection strategies.
1.2. Organization
This paper is organized as follows. Section 2 provides basic definitions and formally states the ASL problem. Classical solutions are described in Section 3, whereas improved algorithms are described in Section 4. A novel ASL system is proposed in Section 5, and the respective experimental results are presented in Section 6. Concluding remarks are drawn in Section 7.
2. Background
2.1. Acoustic Networks
An acoustic network is formed when there exists an acoustic coupling among loudspeakers (herein also called sources) and microphones (herein also denominated as sensors). In a general configuration, an acoustic network is comprised of
This paper addresses the problem of acoustic sensor localization (ASL) without sensor cooperation. In this case, each microphone individually uses the
2.2. Basic Definitions
Let
Instead of using the TOF directly, there are also some ASL techniques that rely on the time-difference-of-flight (TDOF) to estimate the sensor position. The TDOF related to loudspeakers
2.3. TOF Estimation
One of the simplest procedures to obtain a TOF estimate
In order to fully understand the above procedure, consider an ideal setup in which there exists line-of-sight (LOS), that is, a direct path connecting loudspeaker and microphone, and the propagation medium does not introduce any distortion to the emitted signal. In this scenario, the received signal is a delayed version of the emitted signal, that is,
The CCF essentially measures the similarity between signals
Note that, in the ideal setup with
Practical scenarios usually face moderate to severe reverberation, which essentially means that the received signal
The TOF estimate can eventually be computed as
One can employ TOF- or TDOF-based ASL techniques depending on the kind of errors present in each
2.4. Common Practical Issues
The accuracy of ASL techniques depends on how close the estimates
There are also some practical issues, like reverberation and interference, that corrupt the received signal
In general, if source and/or sensor are moving, then the Doppler effect arises and may further corrupt the estimation of
The remainder of this section is devoted to some issues involving time measurements, namely, asynchrony and frequency mismatch between loudspeakers and microphones. When there is asynchrony between microphone and loudspeaker, the argument that maximizes
By assuming that
Once the main definitions and common practical issues have been described, one now has all basic tools to address the sensor localization problem itself.
3. ASL Algorithms
As mentioned in Section 1, there are many different solutions to the problem of localizing an acoustic sensor. This section describes three state-of-the-art families of algorithms that span the main ASL approaches. Section 3.1 details TOF- and TDOF-based least-squares (LS) solutions [23] built on the affine model in (9). Section 3.2 indicates how the steered-response power (SRP) technique, originally targeted to the problem of sound source localization (SSL) using microphone array [27], can be adapted to work in the ASL context. Section 3.3 describes a new localization approach that relies on searching for regions (cuboids) that are likely to contain the sensor.
3.1. Least-Squares
In the LS procedure, many TOF or TDOF measurements are collected and used to estimate the microphone position. In what follows, the TOF-based LS method known as
3.1.1. TOF-Based Problem (
Motivated by the relation in (9), let the error
3.1.2. TDOF-Based Problem (
Without loss of generality, assume that
3.1.3. LS Solutions
For both LS approaches, the error vector can be written in a general form as
Although the entries of
When one can assume that
It is worth pointing out the strong dependence of the LS solutions on the estimated TOFs or TDOFs, whose errors may have a harmful effect on the localization accuracy. This fact motivates the use of a different approach that tries to estimate the sensor position directly, without resorting to prior TOF or TDOF estimation. SRP-inspired solutions allow one to do that, as described in the next section.
3.2. Steered-Response Power
The SRP technique was originally proposed for SSL problems using microphone array, but it can be adjusted to ASL as follows.
In this context, the SRP divides the search space into a grid of points representing possible candidates for the microphone position and measures the overall similarity among emitted and received signals related to each of these points. The point with highest similarity is selected as the estimate of the microphone position. Thus, the SRP in the ASL context searches for the grid point
When using a dense grid of points covering the entire search space, the SRP technique achieves accurate position estimates even in reverberant scenarios, as all TOFs are considered simultaneously in a joint optimization procedure. However, the SRP drawback is its high computational burden due to the exhaustive search throughout the many points of the grid.
It is important to highlight here that SRP-inspired solutions have not been thoroughly explored in the ASL context. For example, the impact of the affine model in (9) on the performance of SRP-inspired solutions should be studied, especially with regard to the bias
The next section describes a new technique that puts together a prior step of TDOF estimation, which usually yields computationally simple solutions, and the SRP-inspired idea of an exhaustive search over a grid of regions, which tends to improve the robustness of the method.
3.3. Region-Based Search
Consider a division of the whole search region
Each TDOF information is usually consistent with several cuboids and thus does not suffice to determine the cuboid that indeed contains the sensor; in fact, one has to aggregate information from all loudspeakers in order to choose the cuboid that most likely contains the sensor. Mathematically, by defining the coherence test through the indicator function
Alternatively, the following objective function could also be employed:
The search procedure usually begins with a coarse volumetric grid covering the entire search space [32]. For instance, one can divide each spatial dimension in
At this point, it might be clear that the practical issues mentioned in Section 2.4 impair the accuracy of the TOF or TDOF estimates. Since the new region-based approach of this section and the LS approaches in Section 3.1 strongly depend on these estimates, it becomes crucial to obtain accurate TOF/TDOF estimates which are robust to these practical issues. This is why the techniques presented in Section 4 focus on improving TOF estimates.
4. Improving TOF/TDOF Estimates
This section describes three techniques from the literature that improve the accuracy of TOF/TDOF estimates. Section 4.1 details the sliding windows approach, which resorts to physical constraints to define time-windows likely to contain the actual TOFs. Section 4.2 describes a method that uses matching pursuit (MP) algorithms to select candidate TOF estimates within the previously defined time-windows, while cleaning spurious components that appear in CCFs or GCC functions. Section 4.3 shows how one can change the selected TOFs in order to improve the final localization accuracy.
4.1. Sliding Windows
Each CCF (or GCC) may present many peaks, hampering the task of detecting the peak associated with the direct path. In order to overcome this difficulty, a set of physical constraints the actual TOFs must satisfy can be imposed on the search for the direct-path CCF peak. These constraints stem from room geometry and probe signals’ inherent structure, such as duration and cyclical nature—ubiquitous in asynchronous passive ASL systems. The sliding windows (SW) approach proposed in [23] is a robust and computationally efficient technique employed to search for CCF peaks while taking into account the aforementioned constraints.
The key idea underlying the SW technique is to jointly search for the direct-path peaks across all CCFs simultaneously. This joint optimization is conducted by adding the selected peaks within fixed-duration windows defined in order to account for possible errors due to acoustic impairments, such as reverberation, interference, and LOS obstruction. The duration of those windows satisfies physical constraints based on room dimensions (e.g., the maximum lag can be upper bounded by the room diameter—maximum distance between points within the room—divided by the speed of sound). Moreover, the knowledge of the delay between consecutive emissions of different loudspeakers as well as the probe signals’ durations is used to constrain the distance between windows from the corresponding CCFs.
Figure 1 illustrates the way SW technique works in a simplified setup. Consider an anechoic room where four loudspeakers emit cyclical impulses as probe signals. Assume there is a fixed delay between emissions from different loudspeakers, which is handy in practical reverberant environments for it helps to deal with signal superpositions, although not mandatory in this toy-example. The first step is to define time-windows (depicted as green boxes in Figure 1) based on the maximum admissible propagation delays within the room. After that, the windows (one for each CCF) are spaced apart based on the fixed delay between consecutive emissions of probe signals. Then, the values of the highest CCF peak within each window are added and stored. A sliding windows process takes place in order to evaluate the initial time-index for which the aforementioned accumulated peak values achieve their maximum. Figure 1(a) shows the windows in a position where the accumulated peak values are not maximal; the sliding process continues in a cyclical fashion, accounting for the cyclical emission of probe signals, as illustrated in Figure 1(b), where the maximum is finally achieved. Thus, the SW technique also has the ability to blindly detect the emission order of the probe signals in the acquired signal, regardless of the order ambiguity induced by the asynchrony among transmitters and receivers.
[figures omitted; refer to PDF]
Mathematically, the SW technique yields a set of time-windows
It is noteworthy that the highest peak inside
4.2. Matching Pursuit
Using the lag associated with the highest CCF peak as TOF estimate is equivalent to employing matched filters for the TOF-estimation task [38]. As seen before, multipath propagation, along with other acoustic phenomena, modifies the desired correlation properties of the probe signals, hampering the TOF-estimation task. A way to circumvent such issues consists of performing a precise estimation of the channel-impulse response (CIR), which can be done if the probe signals are known. The time stamp of the first nonzero coefficient of the estimated CIR should be the desired TOF in an ideal setup.
A natural candidate for CIR estimation is the maximum likelihood (ML) estimate, whose solution, under some mild hypotheses, is equivalent to an unconstrained and nonregularized least-squares (LS) estimate, in the absence of noise measurement and interferences [39] (in fact, the equivalence remains even under normally distributed noise [40]).
Besides its lack of robustness under non-Gaussian perturbations [41], the ML solution is not feasible for real-time localization systems due to its high computational complexity. The authors in [20] proposed a greedy pursuit for CIR estimation, specifically the matching pursuit (MP) algorithm [42]. Such strategy was motivated by [43], which developed a CIR estimation algorithm for multiuser environments in code division multiple access (CDMA) systems, aiming at TOF-based radio localization. The authors in [43] compared the ML and MP techniques, eventually concluding that the last outperforms the former. The inferior ML performance is attributed to the underlying overparameterization of the CIR, which makes the detection of the actual TOF unreliable [43].
The MP algorithm relies on sparse representations of the signals of interest [44], working by progressively isolating the signal structures which are coherent with a predefined dictionary of signals. In the context of ASL, the works [12, 20] employ the MP algorithm to describe an excerpt of the recorded signal as a linear combination of delayed versions of one probe signal. Such decomposition permits to infer the direct-path delay even when its CCF/GCC peak is highly attenuated compared to peaks corresponding to reflected-path delays.
Suppose one has good reasons to believe that the early arrivals of the signal emitted by the
Algorithm 1: TOF estimation by MP algorithm.
(
(
(
(
(
(
(
(
(
(
(
(
(
When the
[figures omitted; refer to PDF]
It should be pointed out that, even when Algorithm 1 fails, it yields a number (
4.3. TOF Selection
As explained before, reverberation and LOS obstruction between loudspeaker and sensor nodes (and other issues) may severely impair the TOF estimate and, therefore, the overall localization procedure [45]. One way of tackling those issues consists of wisely selecting the TOFs from the peaks of the CCFs/GCCs when they are not the highest ones, or even discarding some CCF/GCC information when the actual TOF cannot be found in it [23, 45]. Note that such strategy, henceforth called “TOF selection” or simply TS, does not perform an exhaustive search over a grid of spatial points nor requires evaluations of complex functionals; therefore, it is expected to demand fewer numerical operations than SRP-inspired algorithms.
Each CCF/GCC usually provides many peaks and, consequently, a combinatorial search for the best combination of peaks may be infeasible in some applications. In order to perform a computationally efficient search, the TS technique should rely on some physically plausible heuristics. One possible heuristics example, for instance, makes use of the fact that spurious CCF/GCC peaks generated by multipath propagation often occur after the correct peak, because non-LOS paths are longer [23]. This assumption motivates a search among peaks that occur before the current one if its correctness is under question. Another example of heuristics takes into account that the magnitude of an erroneously detected peak is slightly larger than the magnitude of the correct CCF/GCC peak in a high-SNR regime without LOS obstruction [23]. A more common hypothesis states that the highest CCF/GCC peak is often associated with the desired TOF information.
Such heuristics are not sufficient for reliable TOF estimation in practical environments. In general, they should be connected with an objective function that incorporates geometric constraints in order to assess a specific set of candidate TOFs. Such assessment, which takes into account available a priori information, is key to solving inverse problems [31]. One example of such geometric-based function is the maximum discrepancy between the position obtained by feeding the localization procedure with the TOFs (or TDOFs) estimated from the CCFs/GCCs and the TOFs (or TDOFs) that would have been observed if the sensor were indeed at the estimated position [23]. Another example is the coherence of the current set of estimated TOFs (or TDOFs) with one or more spatial regions [45] (see Section 3.3 for more details).
The TS strategy begins with an evaluation of the objective function, whose result should meet a stop criterion. Such criterion can test whether a threshold is not violated and/or whether the maximum number of iterations is not exceeded. While such criterion is not satisfied, a “Refine” procedure should be used. Such refinement represents the core procedure of this method, which updates the current set of TOF (or TDOF) candidates and may even include a procedure for discarding some CCFs/GCCs from which one cannot extract a reliable TOF (or TDOF) estimate [23].
5. Proposed ASL System
As the localization techniques described in Section 3 present complementary benefits, the proper choice of the technique to be used strongly depends on the requirements of the particular ASL application. In addition, the recent advances described in Section 4 have proved to work in practice and their combinations with the techniques in Section 3 open up a myriad of exciting research directions that have not been fully explored in the ASL context. This section contains an example of how one can put together the advantage of all techniques in Section 4 along with the region-based search proposed in Section 3.3, giving rise to a novel ASL system.
As mentioned before, SRP-inspired searches present high computational complexity due to the required evaluation of many complex functionals. Converting the search into a selection of correct candidate TOFs among the CCF/GCC peaks is an interesting alternative as long as the number of such peaks is small. This is not the case, however, if each CCF/GCC presents many peaks. Resorting to MP-based algorithms is a convenient way of circumventing such problem—recall that Algorithm 1 may return a variable number (
Algorithm 2 describes the proposed TOF-selection scheme. Vector
Algorithm 2: TOF selection by geometric constraints.
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
[figure omitted; refer to PDF]
6. Experimental Results
The main goal of this section is to present an evaluation of the proposed method (described in Section 5) in a real-world scenario, as a proof of concept.
6.1. Experimental Setup
The test environment is a lecture room with a measured reverberation time (
Table 1
Coordinates (in meters) of each loudspeaker.
Index |
|
|
|
1 |
|
|
|
2 |
|
|
|
3 |
|
|
|
4 |
|
|
|
5 |
|
|
|
6 |
|
|
|
7 |
|
|
|
8 |
|
|
|
9 |
|
|
|
10 |
|
|
|
11 |
|
|
|
In order to provide an accurate localization procedure, the probe signals should meet several prerequisites, namely, low audibility, high orthogonality, robustness against interferences, and short duration (which guarantees a high refresh rate of the sensor location). In the following experiments, a new set of probe signals, called polyphonic chirps, were designed to meet such requirements. The starting point of the polyphonic chirps consists of linear chirps, whose bandwidth ranges from 14.0 to 20.0 kHz, with amplitudes following the inverse A-weighting curve (which is the inverse of the human relative loudness [46]), so that there is an increasing gain from 14 to 20 kHz. One can define 4 subbands from the primary chirp signal, namely, from 14.0 to 15.5 kHz, from 15.5 to 17.0 kHz, from 17.0 kHz to 18.5 kHz, and from 17.5 to 20.0 kHz. Within each subband, one can play a chirp with increasing frequency or with decreasing frequency. Bit 0 is associated with the latter and bit 1 with the former. Hence, one can assign to each loudspeaker a 4-bit codeword whose bits, from the most to the least significant, are associated with the subchirps from the lowest to the highest frequency band. Therefore, 4 subchirps are expected to be simultaneously played-back by each loudspeaker, giving birth to a specific polyphonic chirp. Additionally, modifications were heuristically (in the sense that they were based on the audibility level of the resulting probe signals) taken to shorten the probe signals audibility: the two lower-frequency subchirps are attenuated by factors of 20 (subband 14.0–15.5 kHz) and 10 (subband 15.5–17.0 kHz). Further, each polyphonic chirp follows a 5-ms fade-in/fade-out envelope to hide undesirable discontinuities at the start and at the end of the underlying signals. A cyclical emission of 30-ms polyphonic chirps is performed, following an order previously known by the sensor node. There is a 20-ms silence interval between consecutive emissions, aiming at reducing the effects of interference and signal superposition caused by reverberation. It should be emphasized that through informal listening tests under typical environmental conditions, we found that these chirps are inaudible for most people.
The probe signals were emitted with power
6.2. Median Localization Error
This section aims at assessing the impact of different choices of the number of MP decompositions
Table 2
Median localization error (in cm) for different choices of emission power
|
|
|
|
|
|
|
|
|
|
||||
|
||||||
1 |
|
|
|
|
|
|
3 |
|
|
|
|
|
|
5 |
|
|
|
|
|
|
7 |
|
|
|
|
|
|
|
||||||
|
||||||
1 |
|
|
|
|
|
|
3 |
|
|
|
|
|
|
5 |
|
|
|
|
|
|
7 |
|
|
|
|
|
|
|
||||||
|
||||||
1 |
|
|
|
|
|
|
3 |
|
|
|
|
|
|
5 |
|
|
|
|
|
|
7 |
|
|
|
|
|
|
|
||||||
|
||||||
1 |
|
|
|
|
|
|
3 |
|
|
|
|
|
|
5 |
|
|
|
|
|
|
7 |
|
|
|
|
|
|
|
||||||
|
||||||
1 |
|
|
|
|
|
|
3 |
|
|
|
|
|
|
5 |
|
|
|
|
|
|
7 |
|
|
|
|
|
|
6.3. CDF of the Error
In order to evaluate the impact of the emitted power on the localization accuracy, Figure 4 shows the localization error (when it is smaller than 50 cm) and the cumulative distribution function (CDF) of the localization error with
[figures omitted; refer to PDF]
7. Concluding Remarks
This paper serves three purposes. First, it presents a brief review of the vast literature of acoustic sensor localization (ASL) for those beginning in this field. Indeed, a wide range of topics are covered, ranging from the fundamentals of ASL to some state-of-the-art techniques. Second, this paper provides new research directions within the ASL field by explaining how one can borrow some concepts from its dual problem: the sound source localization (SSL). In this way, many research opportunities are opened. Third, this paper proposes a new ASL technique that combines region-based search (which is inspired by some recently proposed SSL techniques that employ hierarchical searches) and matching pursuit estimation of times-of-flight (TOFs).
Another difference from our previous work [23] is that the ASL technique proposed here works with probe signals which are inaudible for most people, as they have low power and contain only high-frequency components. However, the use of such probe signals makes the TOF estimation a much more challenging task. This explains why robust TOF-estimation techniques are thoroughly discussed throughout the paper.
A real-world experiment was conducted in order to demonstrate that the proposed ASL technique is capable of estimating the position of mobile devices with a median localization error below 20 cm. Usually, the ultimate goal of many practical ASL systems is to find the position of someone carrying a mobile device and, therefore, an estimation error inferior to 20 cm is rather reasonable.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
Haddad, Lima, Martins, and Biscainho thank Capes, CNPq, and Faperj agencies for funding their research. The authors thank Mr. Mauricio V. M. Costa, Mr. Igor M. Quintanilha, and Mr. Felipe B. da Silva for their valuable support on recording the database used in this work.
[1] F. Höflinger, J. Wendeberg, R. Zhang, J. Bührer, M. Hoppe, A. Bannoura, L. Reindl, C. Schindelhauer, "Acoustic self-calibrating system for indoor smartphone tracking (ASSIST)," Proceedings of the International Conference on Indoor Positioning and Indoor Navigation (IPIN '12),DOI: 10.1109/IPIN.2012.6418877, .
[2] O. Vinyals, E. Martin, G. Friedland, "Multimodal indoor localization: an audio-wireless-based approach," Proceedings of the 4th IEEE International Conference on Semantic Computing (ICSC '10), pp. 120-125, DOI: 10.1109/ICSC.2010.87, .
[3] N. Gutierrez, C. Belmonte, J. Hanvey, R. Espejo, Z. Dong, "Indoor localization for mobile devices," Proceedings of the 11th IEEE International Conference on Networking, Sensing and Control (ICNSC '14), pp. 173-178, DOI: 10.1109/ICNSC.2014.6819620, .
[4] T. Ajdler, I. Kozintsev, R. Lienhart, M. Vetterli, "Acoustic source localization in distributed sensor networks," Proceedings of the Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers (Asilomar '04), pp. 1328-1332, DOI: 10.1109/ACSSC.2004.1399368, .
[5] A. Canclini, F. Antonacci, A. Sarti, S. Tubaro, "Acoustic source localization with distributed asynchronous microphone networks," IEEE Transactions on Audio, Speech and Language Processing, vol. 21 no. 2, pp. 439-443, DOI: 10.1109/TASL.2012.2215601, 2013.
[6] K. Hasegawa, N. Ono, S. Miyabe, S. Sagayama, "Blind estimation of locations and time offsets for distributed recording devices," Latent Variable Analysis and Signal Separation, vol. 6365, pp. 57-64, 2010.
[7] C. Peng, G. Shen, Y. Zhang, Y. Li, K. Tan, "BeepBeep: a high accuracy acoustic ranging system using COTS mobile devices," Proceedings of the 5th ACM International Conference on Embedded Networked Sensor Systems (SenSys '07),DOI: 10.1145/1322263.1322265, .
[8] M. H. Hennecke, G. A. Fink, "Towards acoustic self-localization of ad hoc smartphone arrays," Proceedings of the 3rd Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA '11), pp. 127-132, DOI: 10.1109/HSCMA.2011.5942378, .
[9] A. M. Cavalcante, R. C. D. Paiva, R. Iida, A. Fialho, A. Costa, R. D. Vieira, "Audio beacon providing location-aware content for low-end mobile devices," Proceedings of the International Conference on Indoor Positioning and Indoor Navigation (IPIN '12),DOI: 10.1109/IPIN.2012.6418897, .
[10] P. Pertila, M. Mieskolainen, M. Hamalainen, "Passive self-localization of microphones using ambient sounds," in Proc. of the 20th European Signal Processing Conference (EUSIPCO '12), pp. 1314-1318, 2012.
[11] K. Liu, X. Liu, L. Xie, X. Li, "Towards accurate acoustic localization on a smartphone," Proceedings of the IEEE INFOCOM 2013 - IEEE Conference on Computer Communications, pp. 495-499, DOI: 10.1109/INFCOM.2013.6566822, .
[12] F. J. Álvarez, T. Aguilera, R. López-Valcarce, "CDMA-based acoustic local positioning system for portable devices with multipath cancellation," Digital Signal Processing, vol. 62, pp. 38-51, DOI: 10.1016/j.dsp.2016.11.001, 2017.
[13] V. Raykar, R. Duraiswami, "Automatic position calibration of multiple microphones," Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 69-72, DOI: 10.1109/ICASSP.2004.1326765, .
[14] J. M. Sachar, H. F. Silverman, W. R. Patterson, "Microphone position and gain calibration for a large-aperture microphone array," IEEE Transactions on Speech and Audio Processing, vol. 13 no. 1, pp. 42-52, DOI: 10.1109/TSA.2004.834459, 2005.
[15] S. T. Birchfield, A. Subramanya, "Microphone array position calibration by basis-point classical multidimensional scaling," IEEE Transactions on Speech and Audio Processing, vol. 13 no. 5, pp. 1025-1034, DOI: 10.1109/TSA.2005.851893, 2005.
[16] I. McCowan, M. Lincoln, I. Himawan, "Microphone array shape calibration in diffuse noise fields," IEEE Transactions on Audio, Speech and Language Processing, vol. 16 no. 3, pp. 666-670, DOI: 10.1109/TASL.2007.911428, 2008.
[17] V. C. Raykar, I. V. Kozintsev, R. Lienhart, "Position calibration of microphones and loudspeakers in distributed computing platforms," IEEE Transactions on Speech and Audio Processing, vol. 13 no. 1, pp. 70-83, DOI: 10.1109/TSA.2004.838540, 2005.
[18] L. Wang, T.-K. Hon, J. D. Reiss, A. Cavallaro, "Self-localization of ad-hoc arrays using time difference of arrivals," IEEE Transactions on Signal Processing, vol. 64 no. 4, pp. 1018-1033, DOI: 10.1109/TSP.2015.2498130, 2016.
[19] R. Pfeil, M. Pichler, S. Schuster, F. Hammer, "Robust acoustic positioning for safety applications in underground mining," IEEE Transactions on Instrumentation and Measurement, vol. 64 no. 11, pp. 2876-2888, DOI: 10.1109/TIM.2015.2433631, 2015.
[20] F. J. Álvarez, R. López-Valcarce, "Multipath cancellation in broadband acoustic local positioning systems," ,DOI: 10.1109/WISP.2015.7139174, .
[21] M. Cobos, J. J. Perez-Solano, Ó. Belmonte, G. Ramos, A. M. Torres, "Simultaneous ranging and self-positioning in unsynchronized wireless acoustic sensor networks," IEEE Transactions on Signal Processing, vol. 64 no. 22, pp. 5993-6004, DOI: 10.1109/TSP.2016.2603972, 2016.
[22] D. B. Haddad, L. O. Nunes, W. A. Martins, L. W. P. Biscainho, B. Lee, "Closed-form solutions for robust acoustic sensor localization," Proceedings of the 14th IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA '13),DOI: 10.1109/WASPAA.2013.6701810, .
[23] D. B. Haddad, W. A. Martins, M. D. V. M. Da Costa, L. W. P. Biscainho, L. O. Nunes, B. Lee, "Robust acoustic self-localization of mobile devices," IEEE Transactions on Mobile Computing, vol. 15 no. 4, pp. 982-995, DOI: 10.1109/TMC.2015.2439278, 2016.
[24] L. O. Nunes, W. A. Martins, M. V. Lima, L. W. Biscainho, M. c. Costa, F. M. Gonc'alves, A. Said, B. Lee, "A steered-response power algorithm employing hierarchical search for acoustic source localization using microphone arrays," IEEE Transactions on Signal Processing, vol. 62 no. 19, pp. 5171-5183, DOI: 10.1109/TSP.2014.2336636, 2014.
[25] C. H. Knapp, G. C. Carter, "The generalized correlation method for estimation of time delay," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24 no. 4, pp. 320-327, DOI: 10.1109/tassp.1976.1162830, 1976.
[26] J. Chen, J. Benesty, Y. Huang, "An acoustic MIMO framework for analyzing microphone-array beamforming," Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07, pp. 25-28, DOI: 10.1109/ICASSP.2007.366607, .
[27] J. DiBiase, High-accuracy, low-latency technique for talker localization in reverberant environments, May 2000. Ph.D. dissertation
[28] A. Beck, P. Stoica, J. Li, "Exact and approximate solutions of source localization problems," IEEE Transactions on Signal Processing, vol. 56 no. 5, pp. 1770-1778, DOI: 10.1109/TSP.2007.909342, 2008.
[29] P. Stoica, J. Li, "Source Localization from Range-Difference Measurements," IEEE Signal Processing Magazine, vol. 23 no. 6, pp. 63-66, DOI: 10.1109/SP-M.2006.248717, 2006.
[30] H. Do, H. F. Silverman, Y. Yu, "A real-time SRP-PHAT source location implementation using stochastic region contraction (SRC) on a large-aperture microphone array," Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '07), vol. 1, pp. 121-124, DOI: 10.1109/ICASSP.2007.366631, .
[31] D. N. Zotkin, R. Duraiswami, "Accelerated speech source localization via a hierarchical search of steered response power," IEEE Transactions on Speech and Audio Processing, vol. 12 no. 5, pp. 499-508, DOI: 10.1109/TSA.2004.832990, 2004.
[32] M. V. S. Lima, W. A. Martins, L. O. Nunes, L. W. P. Biscainho, T. N. Ferreira, M. V. M. Costa, B. Lee, "A volumetric SRP with refinement step for sound source localization," IEEE Signal Processing Letters, vol. 22 no. 8, pp. 1098-1102, DOI: 10.1109/LSP.2014.2385864, 2015.
[33] M. Cobos, A. Marti, J. J. Lopez, "A modified SRP-PHAT functional for robust real-time sound source localization with scalable spatial sampling," IEEE Signal Processing Letters, vol. 18 no. 1, pp. 71-74, DOI: 10.1109/LSP.2010.2091502, 2011.
[34] A. Antoniou, W.-S. Lu, Practical Optimization: Algorithms and Engineering Applications, 2007.
[35] S. Birchfield, D. Gillmor, "Acoustic source direction by hemisphere sampling," Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '01), pp. 3053-3056, DOI: 10.1109/ICASSP.2001.940302, .
[36] J. P. Dmochowski, J. Benesty, S. Affes, "A generalized steered response power method for computationally viable source localization," IEEE Transactions on Audio, Speech and Language Processing, vol. 15 no. 8, pp. 2510-2526, DOI: 10.1109/TASL.2007.906694, 2007.
[37] B. Gendron, T. G. Crainic, "Parallel branch-and-bound algorithms: survey and synthesis," Operations Research, vol. 42 no. 6, pp. 1042-1066, DOI: 10.1287/opre.42.6.1042, 1994.
[38] D. Middleton, "On new classes of matched filters and generalizations of the matched filter concept," IRE Transactions on Information Theory, vol. 6 no. 3, pp. 349-360, DOI: 10.1109/TIT.1960.1057564, 1960.
[39] T. J. Abatzoglou, J. M. Mendel, G. A. Harada, "The constrained total least squares technique and its applications to harmonic superresolution," IEEE Transactions on Signal Processing, vol. 39 no. 5, pp. 1070-1087, DOI: 10.1109/78.80955, 1991.
[40] S. Chandrasekaran, G. H. Golub, M. Gu, A. H. Sayed, "Parameter estimation in the presence of bounded data uncertainties," SIAM Journal on Matrix Analysis and Applications, vol. 19 no. 1, pp. 235-252, DOI: 10.1137/S0895479896301674, 1998.
[41] R. R. Wilcox, Introduction to robust estimation and hypothesis testing, 2011.
[42] S. G. Mallat, Z. Zhang, "Matching pursuits with time-frequency dictionaries," IEEE Transactions on Signal Processing, vol. 41 no. 12, pp. 3397-3415, DOI: 10.1109/78.258082, 1993.
[43] S. Kim, R. A. Iltis, "A matching-pursuit/GSIC-based algorithm for DS-CDMA sparse-channel estimation," IEEE Signal Processing Letters, vol. 11 no. 1, pp. 12-15, DOI: 10.1109/LSP.2003.819349, 2004.
[44] R. Rubinstein, A. M. Bruckstein, M. Elad, "Dictionaries for sparse representation modeling," Proceedings of the IEEE, vol. 98 no. 6, pp. 1045-1057, DOI: 10.1109/JPROC.2010.2040551, 2010.
[45] D. B. Haddad, W. A. Martins, L. W. P. Biscainho, M. D. V. M. Da Costa, K.-H. Kim, "Choosing coherent times of flight for improved acoustic sensor localization," Proceedings of the International Telecommunications Symposium (ITS '14),DOI: 10.1109/ITS.2014.6947978, .
[46] J. Parmanen, "A-weighted sound pressure level as a loudness/annoyance indicator for environmental sounds - Could it be improved?," Applied Acoustics, vol. 68 no. 1, pp. 58-70, DOI: 10.1016/j.apacoust.2006.02.004, 2007.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2017 Diego B. Haddad et al. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The wide availability of mobile devices with embedded microphones opens up opportunities for new applications based on acoustic sensor localization (ASL). Among them, this paper highlights mobile device self-localization relying exclusively on acoustic signals, but with previous knowledge of reference signals and source positions. The problem of finding the sensor position is stated as a function of estimated times-of-flight (TOFs) or time-differences-of-flight (TDOFs) from the sound sources to the target microphone, and the main practical issues involved in TOF estimation are discussed. Least-squares ASL solutions are introduced, followed by other strategies inspired by sound source localization solutions: steered-response power, which improves localization accuracy, and a new region-based search, which alleviates complexity. A set of complementary techniques for further improvement of TOF/TDOF estimates are reviewed: sliding windows, matching pursuit, and TOF selection. The paper proceeds with proposing a novel ASL method that combines most of the previous material, whose performance is assessed in a real-world example: in a typical lecture room, the method achieves accuracy better than 20 cm.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details





1 Computer Engineering Department, Federal Center for Technological Education (CEFET/RJ), Petropolis, RJ, Brazil
2 DEE-DEL/Poli & PEE/COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil
3 Advanced Technology Labs, Microsoft, Rio de Janeiro, RJ, Brazil
4 Department of Electronic Engineering, Inha University, Incheon, Republic of Korea