{"title": "Bayesian Regularization and Nonnegative Deconvolution for Time Delay Estimation", "book": "Advances in Neural Information Processing Systems", "page_first": 809, "page_last": 816, "abstract": null, "full_text": " Bayesian Regularization and Nonnegative\n Deconvolution for Time Delay Estimation\n\n\n\n Yuanqing Lin, Daniel D. Lee\n GRASP Laboratory, Department of Electrical and System Engineering\n University of Pennsylvania, Philadelphia, PA 19104\n linyuanq, ddlee@seas.upenn.edu\n\n\n\n\n Abstract\n\n Bayesian Regularization and Nonnegative Deconvolution (BRAND) is\n proposed for estimating time delays of acoustic signals in reverberant\n environments. Sparsity of the nonnegative filter coefficients is enforced\n using an L1-norm regularization. A probabilistic generative model is\n used to simultaneously estimate the regularization parameters and filter\n coefficients from the signal data. Iterative update rules are derived under\n a Bayesian framework using the Expectation-Maximization procedure.\n The resulting time delay estimation algorithm is demonstrated on noisy\n acoustic data.\n\n\n1 Introduction\n\nEstimating the time difference of arrival is crucial for binaural acoustic sound source\nlocalization[1]. A typical scenario is depicted in Fig. 1 where the azimuthal angle to\nthe sound source is determined by the difference in direct propagation times of the sound\nto the two microphones. The standard signal processing algorithm for determining the\ntime delay between two signals s(t) and x(t) relies upon computing the cross-correlation\nfunction[2]: C(t) = dt x(t)s(t - t) and determining the time delay t that maxi-\nmizes the cross-correlation. In the presence of uncorrelated white noise, this procedure is\nequivalent to the optimal matched filter for detection of the time delayed signal.\n\nHowever, a typical room environment is reverberant and the measured signal is contami-\nnated with echoes from multiple paths as shown in Fig. 1. In this case, the cross-correlation\nand related algorithms may not be optimal for estimating the time delays. An alternative\napproach would be to estimate the multiple time delays as a linear deconvolution problem:\n\n min x(t) - \n is(t - ti) 2 (1)\n i\nUnfortunately, this deconvolution can be ill-conditioned resulting in very noisy solutions\nfor the coefficients . Recently, we proposed incorporating nonnegativity constraints \n0 in the deconvolution to overcome the ill-conditioned linear solutions [3]. The use of\nthese constraints is justified by acoustic models that describe the theoretical room impulse\nresponse with nonnegative filter coeffients [4]. The resulting optimization problem can be\nwritten as the nonnegative quadratic programming problem:\n min x - S 2 (2)\n 0\n\n\f\n s\n\n t\n\n t1 t2\n \n x1 x2\n d\n\n\nFigure 1: The typical scenario of reverberant signal. x2(t) comes from the direct path (t2) and\necho paths(tE).\n\n\n 11\n x 10 Cross-correlation Phase Alignment Transform\n 2.2 800\n a b\n 2 600\n\n\n 1.8 400\n\n\n 1.6 200\n\n\n 1.4 0\n\n\n 1.2 -200\n -10 -5 0 5 10 -10 -5 0 5 10\n time delay(Ts ) time delay(Ts)\n\n Linear Deconvolution Nonnegative Deconvolution\n 200 1\n c d\n 100 0.8\n\n 0 0.6\n\n -100 0.4\n\n -200 0.2\n\n -300 0\n -10 -5 0 5 10 -10 -5 0 5 10\n time delay(Ts ) time delay(Ts )\n\n\nFigure 2: Time delay estimation of a speech signal with a) cross-correlation, b) phase alignment\ntransform, c) linear deconvolution, d) nonnegative deconvolution. The observed signal x(t) = s(t -\nTs) + 0.5s(t - 8.75Ts) contains an additional time-delayed echo. Ts is the sampling interval.\n\n\nwhere x = {x(t1) x(t2) . . . x(tN )}T is a N 1 data vector, S = {s(t - t1) s(t -\nt2) . . . s(t - tM )} is an N M matrix, and is a M 1 vector of nonnegative\ncoefficients.\n\nFigure 2 compares the performance of cross-correlation, phase alignment transform(a gen-\neralized cross-correlation algorithm), linear deconvolution, and nonnegative deconvolution\nfor estimating the time delays in a clean speech signal containing an echo. From the struc-\nture of the estimated coefficients, it is clear that nonnegative deconvolution can successfully\ndiscover the structure of the time delays present in the signal. However, in the presence of\nlarge background noise, it may be necessary to regularize the nonnegative quadratic opti-\nmization to prevent overfitting. In this case, we propose using an L1-norm regularization\nto favor sparse solutions [5]:\n\n min x - S 2 + ^\n \n i (3)\n 0 i\n\nIn this formula, the parameter ^\n (^\n 0) describes the trade-off between fitting the observed\n\n\f\ndata and enforcing sparse solutions. The proper choice of this parameter may be crucial in\nobtaining the optimal time delay estimates. In the rest of this manuscript, we introduce a\nproper generative model for these regularization parameters and filter coefficients within a\nprobabilistic Bayesian framework. We show how these parameters can be efficiently deter-\nmined using appropriate iterative estimates. We conclude by demonstrating and discussing\nthe performance of our algorithm on noisy acoustic signals in reverberant environments.\n\n\n2 Bayesian regularization\n\nInstead of arbitrarily setting values for the regularization parameters, we show how a\nBayesian framework can be used to automatically estimate the correct values from the\ndata. Bayesian regularization has previously been successfully applied to neural network\nlearning [6], model selection, and relevance vector machine (RVM) [7]. In these works, the\nfitting coefficients are assumed to have Gaussian priors, which lead to an L2-norm regu-\nlarization. In our model, we use L1-norm sparsity regularization, and Bayesian framework\nwill be used to optimally determine the appropriate regularization parameters.\n\nOur probabilistic model assumes the observed data signal is generated by convolving the\nsource signal with a nonnegative filter describing the room impulse response. This signal\nis then contaminated by additive Gaussian white noise with zero-mean and covariance 2:\n 1 1\n P (x|S, , 2) = exp - x - S 2 . (4)\n (22)N/2 22\n\nTo enforce sparseness in the filter coefficients , an exponential prior distribution is used.\nThis prior only has support in the nonnegative orthant and the sharpness of the distribution\nis given by the regularization parameter :\n\n M\n P (|) = M exp{- i}, 0 . (5)\n i=1\n\n\nIn order to infer the optimal settings of the regularization parameters 2 and , Bayes rule\nis used to maximize the posterior distribution:\n P (x|, 2, S)P (, 2)\n P (, 2|x, S) = . (6)\n P (x|S)\n\nAssuming that P (, 2) is relatively flat [8], estimating 2 and is then equivalent to\nmaximizing the likelihood:\n\n M\n P (x|, 2, S) = d exp[-F ()] (7)\n (22)N/2 0\nwhere 1\n F () = (x - S)T (x - S) + eT (8)\n 22\nand e = [1 1 . . . 1]T .\n\nUnfortunately, the integral in Eq. 7 cannot be directly maximized. Previous approaches\nto Bayesian regularization have used iterative updates heuristically derived from self-\nconsistent fixed point equations. In our model, the following iterative update rules for\n and 2 can be derived using Expectation-Maximization:\n 1 1\n - d eT Q() (9)\n M 0\n 1\n 2 - d (x - S)T (x - S)Q() (10)\n N 0\n\n\f\nwhere the expectations are taken over the distribution\n\n exp[-F ()]\n Q() = , (11)\n Z\n\nwith normalization Z = d exp[-F ()]\n 0 . These updates have guaranteed con-\nvergence properties and can be intuitively understood as iteratively reestimating and 2\nbased upon appropriate expectations over the current estimate for Q().\n\n\n2.1 Estimation of ML\n\nThe integrals in Eqs. 910 are dominated by ML where the most likely ML is\ngiven by:\n 1\n ML = arg min (x - S)T (x - S) + T . (12)\n 0 22\n\nThis optimization is equivalent to the nonnegative quadratic programming problem in Eq. 3\nwith ^\n = 2. To efficiently compute ML, we have recently developed two distinct\nmethods for optimizing Eq. 12.\n\nThe first method is based upon a multiplicative update rule for nonnegative quadratic pro-\ngramming [9]. We first write the problem in the following form:\n\n 1\n min T A + bT , (13)\n 0 2\n\nwhere A = 1 ST S ST x\n 2 , and b = - 1\n 2 .\n\nFirst, we decompose the matrix A = A+ - A- into its positive and negative components\nsuch that:\n\n A 0 if A\n A+ = ij if Aij > 0 = ij 0\n ij 0 if Aij 0 A-\n ij -Aij if Aij < 0 (14)\n\nThen the following is an auxiliary function that upper bounds Eq. 13 [9]:\n\n 1 (A+ ~\n ) 1 \n G(, ~\n ) = bT + i 2 A- ~\n ij ). (15)\n 2 ~\n i - 2 ij i ~\n j(1 + ln ~\n i i i,j i ~\n j\n\nMinimizing Eq. 15 yields the following iterative multiplicative rule with guaranteed con-\nvergence to ML:\n\n -b + 4(A+)\n i + b2i i(A-)i\n i - i . (16)\n 2(A+)i\n\nThe iterative formula in Eq. 16 is used to efficiently compute a reasonable estimate for\nML from an arbitrary initialization. However, its convergence is similar to other interior\npoint methods in that small components of ML will continually decrease but never equal\nzero. In order to truly sparsify the solution, we employ an alternative method based upon\nthe simplex algorithm for linear programming.\n\nOur other optimization method is based upon finding a solution ML that satistifies the\nKarush-Kuhn-Tucker (KKT) conditions for Eq. 13:\n\n A + b = , 0, 0, ii = 0, i = 1, 2, . . . , M. (17)\n\n\f\nBy introducing additional artificial variables a, the KKT conditions can be transformed into\nthe linear optimization min a\n i i subject to the constraints:\n a 0 (18)\n 0 (19)\n 0 (20)\n A - + sign(-b)a = -b (21)\n ii = 0, i = 1, 2, . . . , M. (22)\n\nThe only nonlinear constraint is the product ii = 0. However, this can be effectively\nimplemented in the simplex procedure by modifying the selection of the pivot element\nto ensure that i and i are never both in the set of basic variables. With this simple\nmodification of the simplex algorithm, the optimal ML can be efficiently computed.\n\n\n2.2 Approximation of Q()\n\nOnce the most likely ML has been determined, the simplest approach for estimating the\nnew and 2 in Eqs. 910 is to replace Q() ( - ML) in the integrals. Unfortu-\nnately, this simple approximation will cause and to diverge from bad initial estimates.\nTo overcome these difficulties, we use a slightly more sophisticated method of estimating\nthe expectations to properly consider variability in the distribution Q().\n\nWe first note that the solution ML of the nonnegative quadratic optimization in Eq. 12\nnaturally partitions the elements of the vector into two distinct subsets I and J , con-\nsisting of components i I such that (ML)i = 0, and components j J such that\n(ML)j > 0, respectively. It will then be useful to approximate the distribution Q() as\nthe factored form:\n Q() QI(I)QJ (J ) (23)\n\nConsider the components J near the maximum likelihood solution ML. Among these\ncomponents, none of nonnegativity constraints are active, so it is reasonable to approximate\nthe distribution QJ (J ) by the unconstrained Gaussian:\n QJ (J ) exp[-F (J |I = 0)] (24)\n\nThis Gaussian distribution has mean ML\n J and inverse covariance given by the submatrix\nAJJ of A = 1 ST S\n 2 .\n\nFor the other components I , it is important to consider the nonnegativity constraints, since\nML = 0\n I is on the boundary of the distribution. We can represent QI (I ) with the first\ntwo order Tyler expansion:\n F 1\n QI(I) exp{-[( )| T\n ML]TI I - 2 I AIII)},\n 1\n exp[-(AML + b)TI I - T\n 2 I AII I]\n I 0. (25)\n\nQI(I) is then approximated with factorial exponential distribution ^\n QI(I) so that the\nintegrals in Eqs. 910 can be easily evaluated.\n ^ 1\n QI(I) = e-i/i, \n I 0 (26)\n iI i\n\nwhich has support only for nonnegative I 0. The mean-field parameters are opti-\nmally obtained by minimizing the KL-divergence:\n ^\n Q\n min d ^\n Q I (I ) . (27)\n I I (I ) ln\n 0 Q\n I 0 I (I )\n\n\f\nThis integral can easily be computed in terms of the parameters and yields the minimiza-\ntion:\n 1\n min - ln T ^\n A, (28)\n i + ^\n bTI +\n 0 2\n iI\n\n\nwhere ^\n bI = (AML + b)I, ^\n A = AII + diag(AII). To solve this minimization problem,\nwe use an auxiliary function for Eq. 28 similar to the auxiliary function for nonnegative\nquadratic programming:\n\n 1 ( ^\n A+ ~\n ) 1 \n G(, ~\n ) = - ln i ^ ij\n i + ^\n bTI + 2 A- ~\n ), (29)\n 2 ~\n i - 2 ij i ~\n j(1+ln ~\n iI iI i i,jI i ~\n j\n\n\nwhere ^\n A = ^\n A+ - ^\n A- is the decomposition of ^\n A into its positive and negative components.\nMinimization of this auxiliary function yields the following multiplicative update rules for\ni:\n -^bi + ^b2 + 4( ^\n A+) ]\n i i[( ^\n A-)i + 1\n \n i\n i - i . (30)\n 2( ^\n A+)i\n\nThese iterations are then guaranteed to converge to the optimal mean-field parameters for\nthe distribution QI (I ).\n\nGiven the factorized approximation ^\n QI(I)QJ (J ), the expectations in Eqs. 910 can be\nanalytically calculated. The mean value of under this distribution is given by:\n\n ML\n \n i if i J\n i = i if i I (31)\n\nand its covariance C is:\n\n (A -1)\n C JJ ij if i, j J\n ij = 2 (32)\n i ij otherwise\n\nThe update rules for and 2 are then given by:\n\n M\n - (33)\n \n \n i i\n 1\n 2 - [(x - S \n )T (x - S \n ) + Tr(ST SC)] (34)\n N\n\nTo summarize, the complete algorithm consists of the following steps:\n\n 1. Initialize and 2.\n\n 2. Determine ML by solving the nonnegative quadratic programming in Eq. 12.\n\n 3. Approximate the distribution Q() ^\n QI(I)QJ (J ) by solving the mean field\n equations for in ^\n QI.\n\n 4. Calculate the mean \n and covariance C for this distribution.\n\n 5. Reestimate regularization parameters and 2 using Eqs. 3334.\n\n 6. Go back to Step 2 until convergence.\n\n\f\n 3\n 3\n 1.8\n 1.8 10\n 10\n\n 40dB\n -40dB\n\n 1.6 20dB\n 1.6 -20dB\n 2\n 2\n 10\n 10 5dB\n - 5dB\n\n\n 1.4\n 1.4\n\n 1\n 1\n 10\n 40dB 10\n - 40dB\n 1.2\n 1.2 20dB\n - 20dB\n\n 5dB\n - 5dB 0\n 0\n 10\n 10\n 1\n 1\n / 2\n / 2\n \n \n 0.8 1\n 0.8 -1\n 10\n 10\n\n\n 0.6\n 0.6\n 2\n -2\n 10\n 10\n\n 0.4\n 0.4\n\n 3\n -3\n 10\n 10\n 0.2\n 0.2\n\n\n 4\n -4\n 0\n 0 10\n 10\n 0\n 0 5\n 5 10\n 10 15\n 15 20\n 20 0\n 0 5\n 5 10\n 10 15\n 15 20\n 20\n\n Iterati\n Iter on N\n ati um\n on N ber\n umber Iterati\n Iter on N\n ati um\n on N ber\n umber\n\n\n\n\nFigure 3: Iterative estimation of (M/ in the figure, indicating the reverberation level) and 2\nwhen x(t) is contaminated by background white noise at -5 dB, -20 dB, and -40 dB levels. The\nhorizontal dotted lines indicate the true levels.\n\n\n3 Results\n\nWe illustrate the performance of our algorithm in estimating the regularization parameters\nas well as the nonnegative filter coefficients of a speech source signal s(t). The observed\nsignal x(t) is simulated by a time-delayed version of the source signal mixed with an echo\nalong with additive Gaussian white noise (t):\n\n x(t) = s(t - Ts) + 0.5s(t - 16.5Ts) + (t). (35)\n\n\n\nWe compare the results of the algorithm as the noise level is changed. Fig. 3 shows the\nconvergence of the estimates for and 2 as the noise level is varied between -5 dB and\n-40 dB. There is rapid convergence of both parameters even with bad initial estimates. The\nresulting value of the 2 parameter is very close to the true noise level. Additionally, the\nestimated parameter is inversely related to the reverberation level of the environment,\ngiven by the sum of the true filter coefficients.\n\nFig. 4 demonstrates the importance of correctly determining the regularization parameters\nin estimating the time delay structure in the presence of noise. Using the Bayesian regu-\nlarization procedure, the resulting estimate for ML correctly models the direct path time\ndelay as well as the secondary echo. However, if the regularization parameters are manu-\nally set incorrectly to over-sparsify the solution, the resulting estimates for the time delays\nmay be quite inaccurate.\n\n\n4 Discussion\n\nIn summary, we propose using a Bayesian framework to automatically regularize nonnega-\ntive deconvolutions for estimating time delays in acoustic signals. We present two methods\nfor efficiently solving the resulting nonnegative quadratic programming problem. We also\nderive an iterative algorithm from Expectation-Maximization to estimate the regularization\nparameters. We show how these iterative updates can simulataneously estimate the time-\ndelay structure in the signal, as well as the background noise level and reverberation level\nof the room. Our results indicate that the algorithm is able to quickly converge to an optimal\nsolution, even with bad initial estimates. Preliminary tests with an acoustic robotic platform\nindicate that these algorithms can successfully be implemented on a real-time system.\n\n\f\n =50, 2 = 0.12\n 1\n a\n 0.8\n\n\n 0.6\n \n 0.4\n\n\n 0.2\n\n\n 0-20 -15 -10 -5 0 5 10 15 20\n Time delay (T ) \n s\n \n 2 =200 \n 1\n b\n 0.8\n\n\n 0.6\n \n 0.4\n\n\n 0.2\n\n\n 0-20 -15 -10 -5 0 5 10 15 20\n Time delay (T ) \n s\n\n\n\nFigure 4: Estimated time delay structure from ML with different regularizations: a) Bayesian\nregularization, b) manually set regularization. Dotted lines indicate the true positions of the time\ndelays.\n\n\n\nWe are currently working to extend the algorithm to the situation where the source signal\nneeds to also be estimated. In this case, priors for the source signal are used to regularize\nthe source estimates. These priors are similar to those used for blind source separation. We\nare investigating algorithms that can simultaneously estimate the hyperparameters for these\npriors in addition to the other parameters within a consistent Bayesian framework.\n\n\nReferences\n\n[1] E. Ben-Reuven and Y. Singer, \"Discriminative binaural sound localization,\" in Advances in Neu-\n ral Information Processing Systems, S. T. Suzanna Becker and K. Obermayer, Eds., vol. 15. The\n MIT Press, 2002.\n\n[2] C. H. Knapp and G. C. Carter., \"The generalized correlation method for estimation of time delay,\"\n IEEE Transactions on ASSP, vol. 24, no. 4, pp. 320327, 1976.\n\n[3] Y. Lin, D. D. Lee, and L. K. Saul, \"Nonnegative deconvolution for time of arrival estimation,\" in\n ICASSP, 2004.\n\n[4] J. B. Allen and D. A. Berkley, \"Image method for efficient simulating small-room acoustics,\" J.\n Acoust. Soc. Am., vol. 65, pp. 943950, 1979.\n\n[5] B. Olshausen and D. Field, \"Emergence of simple-cell receptive field properties by learning a\n sparse code for nature images,\" Nature, vol. 381, pp. 607609, 1996.\n\n[6] D. Foresee and M. Hagan, \"Gauss-Newton approximation to Bayesian regularization,\" in Pro-\n ceedings of the 1997 International Joint Conference on Neural Networks, 1997, pp. 19301935.\n\n[7] M. E. Tipping, \"Sparse Bayesian learning and the relevance vector machine,\" Journal of Machine\n Learning Research, vol. 1, pp. 211244, 2001.\n\n[8] D. MacKay, \"Bayesian interpolation,\" Neural Computation, vol. 4, pp. 415447, 1992.\n\n[9] F. Sha, L. K. Saul, and D. Lee, \"Multiplicative updates for nonnegative quadratic program-\n ming in support vector machines,\" in Advances in Neural Information Processing Systems, S. T.\n Suzanna Becker and K. Obermayer, Eds., vol. 15. The MIT Press, 2002.\n\n\f\n", "award": [], "sourceid": 2710, "authors": [{"given_name": "Yuanqing", "family_name": "Lin", "institution": null}, {"given_name": "Daniel", "family_name": "Lee", "institution": null}]}