{"title": "Broadband Direction-Of-Arrival Estimation Based on Second Order Statistics", "book": "Advances in Neural Information Processing Systems", "page_first": 775, "page_last": 781, "abstract": null, "full_text": "Broadband Direction-Of-Arrival Estimation \n\nBased On Second Order Statistics \n\nJustinian Rosca \n\nJoseph 6 Ruanaidh \n\n{rosca,oruanaidh,jourjine,rickard}@scr.siemens.com \n\nAlexander Jourjine \n\nScott Rickard \n\nSiemens Corporate Research, Inc. \n\n755 College Rd E \nPrinceton, NJ 08540 \n\nAbstract \n\nN wideband sources recorded using N closely spaced receivers can \nfeasibly be separated based only on second order statistics when using \na physical model of the mixing process. In this case we show that the \nparameter estimation problem can be essentially reduced to considering \ndirections of arrival and attenuations of each signal. The paper presents \ntwo demixing methods operating in the time and frequency domain and \nexperimentally shows that it is always possible to demix signals arriving at \ndifferent angles. Moreover, one can use spatial cues to solve the channel \nselection problem and a post-processing Wiener filter to ameliorate the \nartifacts caused by demixing. \n\n1 \n\nIntroduction \n\nBlind source separation (BSS) is capable of dramatic results when used to separate mixtures \nof independent signals. The method relies on simultaneous recordings of signals from two \nor more input sensors and separates the original sources purely on the basis of statistical \nindependence between them. Unfortunately, BSS literature is primarily concerned with the \nidealistic instantaneous mixing model. \n\nIn this paper, we formulate a low dimensional and fast solution to the problem of separating \ntwo signals from a mixture recorded using two closely spaced receivers. Using a physical \nmodel of the mixing process reduces the complexity of the model and allows one to identify \nand to invert the mixing process using second order statistics only. \n\nWe describe the theoretical basis of the new approach, and then focus on two algorithms, \nwhich were implemented and successfully applied to extensive sets of real-world data. In \nessence, our separation architecture is a system of adaptive directional receivers designed \nusing the principles ofBSS. The method bears resemblance to methods in beamforming [8] \nin that it works by spatial filtering. Array processing techniques [2] reduce noise by \nseparating signal space from noise space, which necessitates more receivers than emitters. \nThe main differences are that standard beamforming and array processing techniques [8, \n2] are generally strictly concerned with processing directional narrowband signals. The \ndifference with BSS [7, 6] is that our approach is model-based and therefore the elements \nof the mixing matrix are highly constrained: a feature that aids in the robust and reliable \nidentification of the mixing process. \n\n\f776 \n\nJ. Rosca, J. 6 Ruanaidh, A. Jourjine and S. Rickard \n\nThe layout of the paper is as follows. Sections 2 and 3 describe the theoretical foundation of \nthe separation method that was pursued. Section 4 presents algorithms that were developed \nand experimental results. Finally we summarize and conclude this work. \n\n2 Theoretical foundation for the BSS solution \n\nAs a first approximation to the general multi-path model, we use the delay-mixing model. \nIn this model, only direct path signal components are considered. Signal components from \none source arrive with a fractional delay between the time of arrivals at two receivers. By \nfractional delays, we mean that delays between receivers are not generally integer multiples \nof the sampling period. The delay depends on the position of the source with respect \nto the receiver axis and the distance between receivers. Our BSS algorithms demix by \ncompensating for the fractional delays. This, in effect, is a form of adaptive beamforming \nwith directional notches being placed in the direction of sources of interference [8]. A more \ndetailed account of the analytical structure of the solutions can be found in [1]. \n\nBelow we address the case of two inputs and two outputs but there is no reason why the \ndiscussion cannot be generalized to multiple inputs and multiple outputs. Assume a linear \nmixture of two sources, where source amplitude drops off in proportion to distance: \n\nR-2 \nXi(t) = -S I (t - _Z ) + -S2(t - _Z ) \nC \n\nR-I \nC \n\n1 \nRil \n\n1 \nRi2 \n\n(1) \n\nj = 1, 2, where c is the speed of wave propagation, and Rij indicates the distance from \nreceiver i to source j. This describes signal propagation through a uniform non-dispersive \nmedium. In the Fourier domain, Equation 1 results in a mixing matrix A( w) given by: \n\nA(w) = [~lle-jW~ ~12e-jW~ 1 \n\n1 -jw~ 1 _jw!!JJ.. \nR21e \n\nR 22 e \n\nc \n\nc \n\n(2) \n\nIt is important to note that the columns can be scaled arbitrarily without affecting separation \nof sources because rescaling is absorbed into the sources. This implies that row scaling in \nthe demixing matrix (the inverse of A( w\u00bb \nUsing the Cosine Rule, Rij can be expressed in terms of the distance Rj of source j to \nthe midpoint between two receivers, the direction of arrival of source j, and the distance \nbetween receivers, d, as follows: \n\nis arbitrary. \n\nR;j = [HJ + (~)' + 2(-1)' m Hj COS OJ r \n\n1 \n\n(3) \n\nExpanding the right term above using the binomial expansion and preserving only zeroth \nand first order terms, we can express distance from the receivers to the sources as: \n\nRij = ( Rj + 8~j) + (_l)i (~) cosOj \n\n(4) \n\nThis approximation is valid within a 5% relative error when d ::; ~. With the substitution \nfor Rij and with the redefinition of source j to include the delay due to the term within \nbrackets in Equation 4 divided by c, Equation 1 becomes: \n\nXi(t) = ~ ~ij .Sj (t+(-l)i\u00b7(:c).cosOj ) , i= 1,2 \n\n(5) \n\nJ \n\nIn the Fourier domain, equation 5 results in the simplification to the mixing matrix A( w): \n\nA(w) = \n\n[ \n\n_1_ e-jwo1 \nR Il \u00b7\n_1_ eJW01 \nR21\u00b7 \n\n. \n\n_1_ e-jw02 ] \nRl2 . \n_1_ ejw02 \nR 22' \n\n(6) \n\n\fBroadband DOA Estimation Based on Second Order Statistics \n\n777 \n\nHere phases are functions of the directions of arrival ()j (defined with respect to the midpoint \nbetween receivers), the distance between receivers d, and the speed of propagation c: \nOi = 2dc cos ()i ,i = 1, 2. Rij are unknown, but we can again redefine sources so diagonal \nelements are unity: \n\n(7) \n\nwhere c), C2 are two positive real numbers. \nIn wireless communications sources are \ntypically distant compared to antenna distance. For distant sources and a well matched pair \nof receivers c) ~ C2 ~ 1. Equation 7 describes the mixing matrix for the delay model in \nthe frequency domain, in terms of four parameters, 0) ,02, c), C2. \nThe corresponding ideal demixing matrix W(w), for each frequency w, is given by: \n\nW(w) = A(w) = detA(w) \n\n[ \n\n] _) \n\n1 \n\n[e jW02 \n\n-c2 .ejwol \n\n(8) \n\nThe outputs, estimating the sources, are: \n\n[ z)(w) ] _ W w [X)(W) ] _ \n\n() X2(W) \n\nZ2(W) \n\n-\n\n1 \n\n- detA(w) \n\n[ \n\n_c)e- jW02 ] \ne-; WO l \n\n[ x)(w) ] \n\nX2(W) \n\n(9) \nMaking the transition back to the time domain results in the following estimate of the \noutputs: \n\nwhere @ is convolution, and \n\n(10) \n\n(11) \n\nFormulae 9 and 10 form the basis for two algorithms to be described next, in the time \ndomain and the frequency domains. The algorithms have the role of determining the four \nunknown parameters. Note that the filter corresponding to H (w, 0) , 02, C), C2) should be \napplied to the output estimates in order to map back to the original inputs. \n\n3 Delay and attenuation compensation algorithms \n\nThe estimation of the four unknown parameters 0), 02, C), C2 can be carried out based on \nsecond order criteria that impose the constraint that outputs are decorrelated ([9, 4, 6, 5]). \n\n3.1 Time and frequency domain approaches \n\nThe time domain algorithm is based on the idea of imposing the decorrelation constraint \n(Z) (t), Z2(t)} = 0 between the estimates ofthe outputs, as a function of the delays D) and \nD2 and scalar coefficients c) and C2. This is equivalent to the following criterion: \n\nwhere F(.) measures the cross-correlations between the signals given below, representing \nfiltered versions of the differences of fractionally delayed measurements: \n\n(12) \n\n\f778 \n\nJ Rosca. J 6 Ruanaidh. A. Jourjine and S. Rickard \n\nZ)(t) = h(t, D), D2, e), e2) 0 (X)(t + D2) - e)X2(t\u00bb) \nZ2(t) = h(t, D) , D2, e) , e2) 0 (e2X) (t + D2) - X2(i\u00bb) \nF(D), D2, e), e2) = (Z)(t), Z2(t)} \n\nIn the frequency domain, the cross-correlation of the inputs is expressed as follows: \n\nRX(w) = A(w)Rs(w)AH(w) \n\n(13) \n\n( 14) \n\nThe mixing matrix in the frequency domain has the form given in Equation 7. Inverting \nthis cross correlation equation yields four equations that are written in matrix form as: \n\nSource orthogonality implies that the off-diagonal terms in the covariance matrix must be \nzero: \n\n( 15) \n\nRT2(W) =0 \nRf)(w) = 0 \n\n(16) \n\nFor far field conditions (i.e. the distance between the receivers is much less than the distance \nfrom sources) one obtains the following equations: \n\nThe terms a = e- jw1h and b = e- jwoz are functions of the time delays. Note that there is \na pair of equations of this kind for each frequency. In practice, the unknowns should be \nestimated from data at all available frequencies to obtain a robust estimate. \n\n3.2 Channel selection \n\nUp to this point, there was no guarantee that estimated parameters would ensure source \nseparation in some specific order. We could not decide a priori whether estimated parameters \nfor the first output channel correspond to the first or second source. However, the dependence \nof the phase delays on the angles of arrival suggests a way to break the permutation symmetry \nin source estimation, that is to decide precisely which estimate to present on the first channel \n(and henceforth on the second channel as well). \n\nThe core idea is that directionality and spatial cues provide the information required to \nbreak the symmetry. The criterion we use is to sort sources in order of increasing delay. \nNote that the correspondence between delays and sources is unique when sources are not \nsymmetrical with respect to the receiver axis. When sources are symmetric there is no way \nof distinguishing between their positions because the cosine of the angles of arrival, and \nhence the delay, is invariant to the sign of the angle. \n\n4 Experimental results \n\nA robust implementation of criterion 12 averages cross-correlations over a number of \nwindows, of given size. More precisely F is defined as follows: \n\nF( 0),02) = L I(Z) (t), Z2(t)W \n\nBlocks \n\n( 18) \n\n\fBroadband DOA Estimation Based on Second Order Statistics \n\n779 \n\nNormally q = 1 to obtain a robust estimate. Ngo and Bhadkamkar [5] suggest a similar \ncriterion using q = 2 without making use of the determinant of the mixing matrix. \n\nAfter taking into account all terms from Equation 18, including the determinant of the \nmixing matrix A, we obtain the function to be used for parameter estimation in the frequency \ndomain: \n\nF(01,02) = ~ 2 \u00b7 -bRl1 (W) -\n\n~ 1 \n\nI a x \n\n{ det A} + TJ \n\nw \n\nb x \n-R22(W) - abR21 (w) -\na \n\nx \n\nIq \n1 x \n-bRI2(w) \na \n\n(19) \n\nwhere TJ is a (Wiener Filter-like) constant that helps prevent singularities and q is normally \nset to one. \n\nComputing the separated sources using only time differences leads to highpass filtered \noutputs. In order to implement exactly the theoretical demixing procedure presented one \nhas to divide by the determinant of the mixing matrix. Obviously one could filter using the \ninverse of the determinant to obtain optimal results. This can be implemented in the form \nof a Wiener filter. The Wiener filter requires knowledge both ofthe signal and noise power \nspectral densities. This information is not available to us but a reasonable approximation is \nto assume that the (wideband) sources have a flat spectral density and the noise corrupting \nthe mixtures is white. In this case, the Wiener Filter becomes: \n\nH w _ ( \n\n( ) -\n\n{detA(W)}2) \n\n{ det A (w )} 2 + TJ \n\n1 \n\ndet A (w ) \n\n(20) \n\nwhere the parameter TJ has been empirically set to the variance of the mixture. Applying \nthis choice of filter usually dramatically improves the quality of the separated outputs. \n\nThe technique of postprocessing using the determinant of the mixing matrix is perfectly \ngeneral and applies equally well to demixtures computed using matrices of FIR filters. \nThe quality of the result depends primarily on the care with which the inverse filter is \nimplemented. It also depends on the accuracy of the estimate for the mixing parameters. \nOne should avoid using the Wiener filter for near-degenerate mixtures. \n\nThe proof of concept for the theory outlined above was obtained using speech signals which \nif anything pose a greater challenge to separation algorithms because of the correlation \nstructure of speech. Two kinds of data are considered in this paper: synthetic direct \npropagation delay data and synthetic mUlti-path data. Data can be characterized along \ntwo dimensions of difficulty: synthetic vs. real-world, and direct path vs. multi-path. \nCombinations along these dimensions represented the main type of data we used. \n\nThe value of distance between receivers dictates the order of delays that can appear due \nto direct path propagation, which is used by the demixing algorithms. Data was generated \nsynthetically employing fractional delays corresponding to the various positions of the \nsources [3]. \n\nWe modeled multi-path by taking into account the decay in signal amplitude due to propa(cid:173)\ngation distance as well as the absorption of waves. Only the direct path and one additional \npath were considered. \n\nThe algorithms developed proved successful for separation of two voices from direct path \nmixtures, even where the sources had very similar spectral power characteristics, and for \nseparation of one source for multi-path mixtures. Moreover, outputs were free from artifacts \nand were obtained with modest computational requirements. \n\nFigure 1 presents mean separation results of the first and second channels, which correspond \nto the first and second sources, for various synthetic data sets. Separation depends on the \nangles of arrival. Plots show no separation in the degenerate case of equal or closeby \nangles of arrival, but more than lOdB mean separation in the anechoic case and 5dB in the \nmUlti-path case. \n\n\f780 \n\nJ. Rosca, J. 6 Ruanaidh, A. Jourjine and S. Rickard \n\n50 \n\n.. \nf .. \nI \nI \ni\" \n\nI ,. \n\n\\ / .~ .. ,,\" . \n\nt \n\nI~ Anechoic F \n\nDoma1 \nAnechoicT~ _ \n.... \n\n'00 \n\n-1\u00b00 \n\nso \n\n,so \n\n-\"-\n\n50 \n\n.. \ni\" \nj\" \nI ,. \n\nI \n\nso \n\n=t~Oomal \n.... \n\n'so \n\n'00 \n\n-\"-\n\n,. \n.. \ni \nI\" \n1.. \n\" f. \n\nI \n\n-6. \n,. \n,. \n\nIf \n\n\" \n\n/ \n\n:-..\\ \n\n'\" \n\n\"-1'-\n\nI ., =~H \n... \n\n'00 \n\n50 \n\n,,., \n\n... \n-\"-\n\n... \n\n.... \n\ni,. \nI . \n1. \n\" \nf \u00b7 \nI \u00b7 \n\nso \n\n.... :-' .... \n\n: \\ \n\n\" \n\n... \n\n=t~ \n... \n-\"-\n\n'00 \n\n'50 \n\n') \n\n.... \n\n... \n\n210 \n\ni \n\nI \nill \n\n, ' \n\n... \n\n.... \n\n210 \n\nFigure 1: Two sources were positioned at a relatively large distance from a pair of closely \nspaced receivers. The first source was always placed at zero degrees whilst the second \nsource was moved uniformly from 30 to 330 degrees in steps of 30 degrees. The above \nshows mean separation and standard deviation error bars of first and second sources for six \nsynthetic delay mixtures or synthetic mUlti-path data mixtures using the time and frequency \ndomain algorithms. \n\n5 Conclusions \n\nThe present source separation approach is based on minimization of cross-correlations of \nthe estimated sources, in the time or frequency domains, when using a delay model and \nexplicitly employing dirrection of arrival. The great advantage of this approach is that it \nreduces source separation to a decorrelation problem, which is theoretically solved by a \nsystem of equations. Although the delay model used generates essentially anechoic time \ndelay algorithms, the results of this work show systematic improvements even when the \nalgorithms are applied to real multi-path data. In all cases separation improvement is robust \nwith respect to the power ratios of sources. \n\nAcknowledgments \n\nWe thank Radu Balan and Frans Coetzee for useful discussions and proofreading various \nversions of this document and our collaborators within Siemens for providing extensive \ndata for testing. \n\n\fBroadband DOA Estimation Based on Second Order Statistics \n\n781 \n\nReferences \n[1] A. Jourjine, S. Rickard, J. 6 Ruanaidh, and J. Rosca. Demixing of anechoic time delay \nmixtures using second order statistics. Technical Report SCR-99-TR-657, Siemens \nCorporate Research, 755 College Road East, Princeton, New Jersey, 1999. \n\n[2] Hamid Krim and Mats Viberg. Two decades of array signal processing research. IEEE \n\nSignal Processing Magazine, 13(4), 1996. \n\n[3] Tim Laakso, Vesa Valimaki, Matti Karjalainen, and Unto Laine. Splitting the unit delay. \n\nIEEE Signal Processing Magazine, pages 30-60,1996. \n\n[4] L. Molgedey and H.G. Schuster. Separation of a mixture of independent signals using \n\ntime delayed correlations. Phys.Rev.Lett., 72(23):3634-3637, July 1994. \n\n[5] T. J. Ngo and N.A. Bhadkamkar. Adaptive blind separation of audio sources by a \nphysically compact device using second order statistics. In First International Workshop \non leA and BSS, pages 257-260, Aussois, France, January 1999. \n\n[6] Lucas Parra, Clay Spence, and Bert De Vries. Convolutive blind source separation \n\nbased on multiple decorrelation. In NNSP98, 1988. \n\n[7] K. Torkolla. Blind separation for audio signals: Are we there yet? In First International \nWorkshop on Independent component analysis and blind source separation, pages 239-\n244, Aussois, France, January 1999. \n\n[8] V. Van Veen and Kevin M. Buckley. Beamforrning: A versatile approach to spatial \n\nfiltering. IEEE ASSP Magazine, 5(2), 1988. \n\n[9] E. Weinstein, M. Feder, and A. Oppenheim. Multi-channel signal separation by decor(cid:173)\n\nrelation. IEEE Trans. on Speech and Audio Processing, 1 (4):405-413, 1993. \n\n\f", "award": [], "sourceid": 1680, "authors": [{"given_name": "Justinian", "family_name": "Rosca", "institution": null}, {"given_name": "Joseph", "family_name": "Ruanaidh", "institution": null}, {"given_name": "Alexander", "family_name": "Jourjine", "institution": null}, {"given_name": "Scott", "family_name": "Rickard", "institution": null}]}