{"title": "Adaptive Template Matching with Shift-Invariant Semi-NMF", "book": "Advances in Neural Information Processing Systems", "page_first": 921, "page_last": 928, "abstract": "How does one extract unknown but stereotypical events that are linearly superimposed within a signal with variable latencies and variable amplitudes? One could think of using template matching or matching pursuit to find the arbitrarily shifted linear components. However, traditional matching approaches require that the templates be known a priori. To overcome this restriction we use instead semi Non-Negative Matrix Factorization (semi-NMF) that we extend to allow for time shifts when matching the templates to the signal. The algorithm estimates templates directly from the data along with their non-negative amplitudes. The resulting method can be thought of as an adaptive template matching procedure. We demonstrate the procedure on the task of extracting spikes from single channel extracellular recordings. On these data the algorithm essentially performs spike detection and unsupervised spike clustering. Results on simulated data and extracellular recordings indicate that the method performs well for signal-to-noise ratios of 6dB or higher and that spike templates are recovered accurately provided they are sufficiently different.", "full_text": "Adaptive Template Matching with\n\nShift-Invariant Semi-NMF\n\nJonathan Le Roux\n\nGraduate School of Information\n\nScience and Technology\nThe University of Tokyo\n\nAlain de Cheveign\u00b4e\n\nCNRS, Universit\u00b4e Paris 5,\n\nand Ecole Normale Sup\u00b4erieure\nAlain.de.Cheveigne@ens.fr\n\nleroux@hil.t.u-tokyo.ac.jp\n\nLucas C. Parra\u2044\n\nBiomedical Engineering\nCity College of New York\n\nCity University of New York\n\nparra@ccny.cuny.edu\n\nAbstract\n\nHow does one extract unknown but stereotypical events that are linearly su-\nperimposed within a signal with variable latencies and variable amplitudes?\nOne could think of using template matching or matching pursuit to \ufb01nd\nthe arbitrarily shifted linear components. However, traditional matching\napproaches require that the templates be known a priori. To overcome this\nrestriction we use instead semi Non-Negative Matrix Factorization (semi-\nNMF) that we extend to allow for time shifts when matching the templates\nto the signal. The algorithm estimates templates directly from the data\nalong with their non-negative amplitudes. The resulting method can be\nthought of as an adaptive template matching procedure. We demonstrate\nthe procedure on the task of extracting spikes from single channel extra-\ncellular recordings. On these data the algorithm essentially performs spike\ndetection and unsupervised spike clustering. Results on simulated data and\nextracellular recordings indicate that the method performs well for signal-\nto-noise ratios of 6dB or higher and that spike templates are recovered\naccurately provided they are su\ufb03ciently di\ufb00erent.\n\n1 Introduction\n\nIt is often the case that an observed waveform is the superposition of elementary wave-\nforms, taken from a limited set and added with variable latencies and variable but positive\namplitudes. Examples are a music waveform, made up of the superposition of stereotyped\ninstrumental notes, or extracellular recordings of nerve activity, made up of the super-\nposition of spikes from multiple neurons.\nIn these examples, the elementary waveforms\ninclude both positive and negative excursions, but they usually contribute with a positive\nweight. Additionally, the elementary events are often temporally compact and their oc-\ncurrence temporally sparse. Conventional template matching uses a known template and\ncorrelates it with the signal; events are assumed to occur at times where the correlation is\nhigh. Multiple template matching raises combinatorial issues that are addressed by Match-\ning Pursuit [1]. However these techniques assume a preexisting dictionary of templates. We\n\n\u2217\n\nCorresponding author\n\n\fwondered whether one can estimate the templates directly from the data, together with\ntheir timing and amplitude.\nOver the last decade a number of blind decomposition methods have been developed that\naddress a similar problem: given data, can one \ufb01nd the amplitudes and pro\ufb01les of constituent\nsignals that explain the data in some optimal way. This includes independent component\nanalysis (ICA), non-negative matrix factorization (NMF), and a variety of other blind source\nseparation algorithms. The di\ufb00erent algorithms all assume a linear superposition of the\ntemplates, but vary in their speci\ufb01c assumptions about the statistics of the templates and\nthe mixing process. These assumptions are necessary to obtain useful results because the\nproblem is under-constrained.\nICA does not \ufb01t our needs because it does not implement the constraint that compo-\nnents (templates) are added with positive weights. NMF constrains weights to be non-\nnegative but requires templates to also be non-negative. We will use instead the semi-NMF\nalgorithm of Chris Ding [2, 3] that allows factoring a matrix into a product of a non-negative\nand an arbitrary matrix. To accommodate time shifts we modify it following the ideas of\nMorten M\u00f8rup [4] who presented a shift-invariant version of the NMF algorithm, that also\nincludes sparsity constraints. We begin with the conventional formulation of the NMF mod-\neling task as a matrix factorization problem and then derive in the subsequent section the\ncase of a 1D sequence of data. NMF models a data matrix X as a factorization,\n\n\u02c6X = AB ,\n\n(1)\nwith A \u2265 0 and B \u2265 0 and \ufb01nds these coe\ufb03cients such that the square modeling error\n||X \u2212 \u02c6X||2 is minimized. Matrix A can be thought of as component amplitudes and the rows\nof matrix B are the component templates. Semi-NMF drops the non-negative constraint\nfor B, while shift-NMF allows the component templates to be shifted in time. In the NMF\nalgorithm, there is an update equation for A and an update equation for B. Semi-NMF and\nshift-NMF each modi\ufb01es one of these equations, fortunately not the same, so their updates\ncan be interleaved without interference.\n\n2 Review of semi-NMF\n\nAssume we are given N observations or segments of data with T samples arranged as\na matrix Xnt. (The segments can also represent di\ufb00erent epochs, trials, or even di\ufb00erent\nchannels.) The goal is to model this data as a linear superposition of K component templates\nBkt with amplitudes Ank, i.e.,\n\n\u02c6Xnt =\n\nAnkBkt = Ak\n\nnBkt .\n\n(2)\n\nX\n\nk\n\nThe second expression here uses Einstein notation: indices that appear both as superscript\nand subscript within a product are to be summed.\nIn contrast to matrix notation, all\ndimensions of an expression are apparent, including those that are absorbed by a sum, and\nthe notation readily extends to more than two dimensions, which we will need when we\nintroduce delays. We use this notation throughout the paper and include explicit sum signs\nonly to avoid possible confusion.\nNow, to minimize the modeling error\n\nE = ||X \u2212 \u02c6X||2\n\n2 =\n\nnBkt\n\n,\n\n(3)\n\n\u00a1\nXnt \u2212 Ak\n\nX\n\nnt\n\n\u00a22\n\nthe semi-NMF algorithm iterates between \ufb01nding the optimum B for a given A, which is\ntrivially given by the classic least squares solution,\n\nand improving the estimate of A for a given B with the multiplicative update\n\nBkt = (An\n\nk Ak0n)\n\n\u00a11 Ak\n\n0\nn X n\nt ,\n\ns\n\nAnk \u2190 Ank\n\n(X t\n(X t\n\nnBkt)+ + Ak0\n\u00a1\n+ Ak0\nnBkt)\n\nn (Bt\nn (Bt\n\n\u00a1\nk0Bkt)\nk0Bkt)+ .\n\n(4)\n\n(5)\n\n\f2(|M| + M) and (M)\u00a1 = 1\n\n0 is a summation index; (M)\u00a11 stands for matrix inverse of M; and,\nIn these expressions, k\n2(|M| \u2212 M) are to be applied on each element of matrix\n(M)+ = 1\nM. The multiplicative update (5) ensures that A remains non-negative in each step; while,\nbaring constraints for B, the optimum solution for B for a given A is found in a single step\nwith (4).\n\n3 Shift-invariant semi-NMF\n\n3.1 Formulation of the model for a 1D sequence\n\nConsider now the case where the data is given as a 1-dimensional time sequence Xt. In\nthe course of time, various events of unknown identity and variable amplitude appear in\nthis signal. We describe an event of type k with a template Bkl of length L. Time index l\nrepresents now a time lag measured from the onset of the template. An event can occur at\nany point in time, say at time sample n, and it may have a variable amplitude. In addition,\nwe do not know a priori what the event type is and so we assign to each time sample n and\neach event type k an amplitude Ank \u2265 0. The goal is to \ufb01nd the templates B and amplitudes\nA that explain the data. In this formulation of the model, the timing of an event is given\nby a non-zero sample in the amplitude matrix A. Ideally, each event is identi\ufb01ed uniquely\nand is well localized in time. This means that for a given n the estimated amplitudes are\npositive for only one k, and neighboring samples in time have zero amplitudes. This new\nmodel can be written as\n\n\u02c6Xt =\n\n=\n\nnBk,t\u00a1n\n\nAk\n\nn\u03b4n,t\u00a1lBkl .\n\nn\n\nAk\n\nX\nX\nX\nX\n\nn\n\nl\n\n(6)\n\n(7)\n\n(10)\n\nThe Kronecker delta \u03b4tl was used to induce the desired shifts n. We can dispense with the\ncumbersome shift in the index if we introduce\n\n\u02dcAtkl =\n\n(8)\nThe tensor \u02dcAtkl represents a block Toeplitz matrix, with K blocks of dimension T \u00d7 L. Each\nblock implements a convolution of the k-th template Bkl with amplitudes signal Ank. With\nthis de\ufb01nition the model is written now simply as:\n\nAnk\u03b4n,t\u00a1l .\n\nn\n\n(9)\nwith Ank \u2265 0. We will also require a unit-norm constraint on the K templates in B, namely,\nkBkl = 1, to disambiguate the arbitrary scale in the product of A and B.\nBl\n\nt Bkl ,\n\n\u02c6Xt = \u02dcAkl\n\n3.2 Optimization criterion with sparseness prior\n\nUnder the assumption that the data represent a small set of well-localized events, matrix\nA should consist of a sparse series of pulses, the other samples having zero amplitude. To\nfavor solutions having this property, we use a generalized Gaussian distribution as prior\nprobability for the amplitudes. Assuming Gaussian white noise, the new cost function given\nby the negative log-posterior reads (up to a scaling factor),\n\nE =\n\n\u2021\nX\n||X \u2212 \u02c6X||2\n\n\u00b72\n2 + \u03b2 ||A||\u03b1\n\n\u03b1\n\nXt \u2212 \u02dcAkl\n\n1\n2\n1\n2\n\nX\n\n=\n\nt Bkl\n\n(11)\nwhere ||\u00b7||p denotes the Lp norm (or quasi-norm for 0 < p < 1). The shape parameter \u03b1 of the\ngeneralized Gaussian distribution controls the odds of observing low versus high amplitude\nvalues and should be chosen based on the expected rate of events. For our data we mostly\nchoose \u03b1 = 1/4. The parameter \u03b2 is a normalization constant which depends on the power\nof the noise, \u03c32\n\nN , and the power of the amplitudes, \u03c32\n\nA, with \u03b2 = \u03c32\n\n\u00a2\u03b1/2.\n\n\u0393(3/\u03b1)/\u0393(1/\u03b1)\n\n+ \u03b2\n\n\u00a1\n\nA\u03b1\n\nkl ,\n\nkl\n\nt\n\nN\n\u03c3\u03b1\nA\n\n\f3.3 A update\n\nThe update for A which minimizes this cost function is similar to update (5) with some\nmodi\ufb01cations. In (5), amplitudes A can be treated as a matrix of dimensions T \u00d7 K and\neach update can be applied separately for every n. Here the problem is no longer separable\nin n and we need to treat A as a 1 \u00d7 T K matrix. B is now a T K \u00d7 T matrix of shifted\ntemplates de\ufb01ned as \u02dcBnkt = Bk,t\u00a1n. The new update equation is similar to (5), but di\ufb00ers\nin the term BBT :\n\nvuuuut\n\n\u2021\n\n\u00b7+\n\n\u2021\n\n\u02dcX l\n\n\u00b7\u00a1\n\nnBkl\n\n\u02dcX l\n\nnBkl\n\n+ An0k0\n\n\u2021\n\n\u00b7\u00a1\n\n\u2021\n\n\u00b7+\n\n+ An0k0\n\n\u02dcBt\nn0k0 \u02dcBnkt\n\u02dcBt\nn0k0 \u02dcBnkt\n\n+ \u03b1\u03b2A\u03b1\u00a11\n\nnk\n\nAnk \u2190 Ank\n\n.\n\n(12)\n\nn = Xn+l, and the time index in the summation \u02dcX l\n\nThe summation in the BBT term is over t, and is 0 most of the time when the events do not\noverlap. We also de\ufb01ned \u02dcX l\nnBkl extends\nonly over lags l from 0 to L\u22121. To limit the memory cost of this operation, we implemented\nit by computing only the non-zero parts of the T K \u00d7 T K matrix BBT as 2L \u2212 1 blocks of\nsize K \u00d7 K. The extra term in the denominator of (12) is the gradient of the sparseness\nterm in (11). A convergence proof for (12) can be obtained by modifying the convergence\nproof of the semi-NMF algorithm in [2] to include the extra L\u03b1 norm as penalty term. The\nproof relies on a new inequality on the L\u03b1 norm recently introduced by Kameoka to prove\nthe convergence of his complex-NMF framework [5].\n\n3.4 B update\n\nThe templates B that minimize the square modeling error, i.e., the \ufb01rst term of the cost\nfunction (11), are given by a least-squares solution which now writes:\n\n\u2021\n\n\u00b7\u00a11 \u02dcAk\n\nBkl =\n\n\u02dcAt\n\nkl\n\n\u02dcAtk0l0\n\n0\n\n0\nt X t .\n\nl\n\n(13)\n\nThe matrix inverse is now over a matrix of LK by LK elements. Note that the sparseness\nprior will act to reduce the magnitude of A. Any scaling of A can be compensated by a\ncorresponding inverse scaling of B so that the \ufb01rst term of the cost function remains unaf-\nfected. The unit-norm constraint for the templates B therefore prevents A from shrinking\narbitrarily.\n\n3.5 Normalization\n\n\u2021\n\n\u00b7\u00a11 \u02dcAk\n\nThe normalization constraint of the templates B can be implemented using Lagrange mul-\ntipliers, leading to the constrained least squares solution:\n\n\u02dcAt\n\n\u02dcAtk0l0 + \u039bkl,k0l0\n\nkl\n\nBkl =\n\n(14)\nHere, \u039bkl,k0l0 represents a diagonal matrix of size KL \u00d7 KL with K di\ufb00erent Lagrange\nkBkl = 1 for all k. This can be\nmultipliers as parameters that need to be adjusted so that Bl\nkBkl \u2212 1. The K\ndone with a Newton-Raphson root search of the K functions fk(\u039b) = Bl\ndimensional search for the Lagrange multipliers in \u039b can be interleaved with updates of A\nand B. For simplicity however, in our \ufb01rst implementation we used the unconstrained least\nsquares solution (\u039b = 0) and renormalized B and A every 10 iterations.\n\nl\n\n0\n\n0\nt X t .\n\n4 Performance evaluations\n\nWe evaluated the algorithm on synthetic and real data. Synthetic data are used to provide\na quantitative evaluation of performance as a function of SNR and the similarity of di\ufb00erent\ntemplates. The algorithm is then applied to extracellular recordings of neuronal spiking\nactivity and we evaluate its ability to recover two distinct spike types that are typically\nsuperimposed in this data.\n\n\fFigure 1: Example of synthetic spike trains and estimated model parameters at an SNR\nof 2 (6 dB). Top left: synthetic data. Bottom left: synthetic parameters (templates B and\nweight matrices A). Top right: reconstructed data. Bottom right: estimated parameters.\n\n4.1 Quantitative evaluation on synthetic data\n\nThe goal of these simulations is to measure performance based on known truth data. We\nreport detection rate, false-alarm rate, and classi\ufb01cation error. In addition we report how\naccurately the templates have been recovered. We generated synthetic spike trains with\ntwo types of \u201cspikes\u201d and added Gaussian white noise. Figure 1 shows an example for\nSNR = \u03c3A/\u03c3N = 2 (or 6 dB). The two sets of panels show the templates B (original\non the left and recovered on the right), amplitudes A (same as above) and noisy data\nX (left) and estimated \u02c6X (right). The \ufb01gure shows the model parameters which resulted in\na minimum cost. Clearly, for this SNR the templates have been recovered accurately and\ntheir occurrences within the waveform have been found with only a few missing events.\nPerformance as a function of varying SNR is shown in Figure 2. Detection rate is measured\nas the number of events recovered over the total number of events in the original data.\nFalse-alarms occur when noise is interpreted as actual events. Presence or absence of a\nrecovered event is determined by comparing the original pulse train with the reconstructed\npulse train A (channel number k is ignored). Templates in this example have a correlation\ntime (3 dB down) of 2-4 samples and so we tolerate a misalignment of events of up to\n\u00b12 samples. We simulated 30 events with amplitudes uniformly distributed in [0, 1]. The\nalgorithm tends to miss smaller events with amplitudes comparable to the noise amplitude.\nTo capture this e\ufb00ect, we also report a detection rate that is weighted by event amplitude.\nSome events may be detected but assigned to the wrong template. We therefore report also\nclassi\ufb01cation performance. Finally, we report the goodness of \ufb01t as R2 for the templates B\nand the continuous valued amplitudes A for the events that are present in the original data.\nNote that the proposed algorithm implements implicitly a clustering and classi\ufb01cation pro-\ncess. Obviously, the performance of this type of unsupervised clustering will degrade as the\ntemplates become more and more similar. Figure 2 shows the same performance numbers\nas a function of the similarity of the templates (without additive noise). A similarity of 0\ncorresponds to the templates shown as examples in Figure 1 (these are almost orthogonal\nwith a cosine of 74\u2013), and similarity 1 means identical templates. Evidently the algorithm\nis most reliable when the target templates are dissimilar.\n\n4.2 Analysis of extracellular recordings\n\nThe original motivation for this algorithm was to analyze extracellular recordings from\nsingle electrodes in the guinea pig cochlear nucleus. Spherical and globular bushy cells in\nthe anteroventral cochlear nucleus (AVCN) are assumed to function as reliable relays of\nspike trains from the auditory nerve, with \u201cprimary-like\u201d responses that resemble those of\nauditory nerve \ufb01bers. Every incoming spike evokes a discharge within the outgoing axon [6].\n\n12505007501000\u2212101AmplitudeNoisy dataTime (sample)11325Templates BLag12505007501000Time (sample)Event amplitudes A12505007501000\u2212101AmplitudeReconstructed waveformTime (sample)11530Templates BLag12505007501000Time (sample)Event amplitudes A\fFigure 2: Left graph: performance as a function of SNR. Error bars represent standard\ndeviation over 100 repetitions with varying random amplitudes and random noise. Top left:\ndetection rate. Top center: weighted detection rate. Top right: misclassi\ufb01cation rate (events\nattributed to the wrong template). Bottom left: false alarm rate (detected events which\ndo not correspond to an event in the original data). Bottom center: R2 of the templates\nB. Bottom right: R2 of the amplitudes A. Right graph: same as a function of similarity\nbetween templates.\n\nHowever, recent observations give a more nuanced picture, suggesting that the post-synaptic\nspike may sometimes be suppressed according to a process that is not well understood [7].\nExtracellular recordings from primary-like cells within AVCN with a single electrode typi-\ncally show a succession of events made up of three sub-events: a small pre-synaptic spike\nfrom the large auditory nerve \ufb01ber terminal, a medium-sized post-synaptic spike from the\ninitial segment of the axon where it is triggered (the IS spike), and a large-sized spike pro-\nduced by back-propagation into the soma and dendrites of the cell (the soma-dendritic or\nSD spike) (Fig. 3). Their relative amplitudes depend upon the position of the electrode\ntip relative to the cell. Our aim is to isolate each of these components to understand the\nprocess by which the SD spike is sometimes suppressed. The events may overlap in time (in\nparticular the SD spike always overlaps with an IS spike), with varying positive amplitudes.\nThey are temporally compact, on the order of a millisecond, and they occur repeatedly\nbut sparsely throughout the recording, with positive amplitudes. The assumptions of our\nalgorithm are met by these data, as well as by multi-unit recordings re\ufb02ecting the activity\nof several neurons (the \u201cspike sorting problem\u201d).\nIn the portions of our data that are su\ufb03ciently sparse (spontaneous activity), the compo-\nnents may be separated by an ad-hoc procedure: (a) trigger on the high-amplitude IS-soma\ncomplexes and set to zero, (b) trigger on the remaining isolated IS spikes and average to\nderive an IS spike template (the pre-synaptic spike is treated as part of the IS spike), (c) \ufb01nd\nthe best match (in terms of regression) of the initial portion of the template to the initial\nportion of each IS-SD complex, (d) subtract the matching waveform to isolate the SD spikes,\nrealign, and average to derive an SD spike template. The resulting templates are shown in\nFig. 3 (top right). This ad-hoc procedure is highly dependent on prior assumptions, and\nwe wished to have a more general and \u201cagnostic\u201d method to apply to a wider range of\nsituations.\nFigure 3 (bottom) shows the result of our automated algorithm. The automatically recovered\nspike templates seem to capture a number of the key features. Template 1, in blue, resembles\nthe SD spike, and template 2, in red, is similar to the IS spike. The SD spikes are larger and\nhave sharper peaks as compared to the IS spikes, while the IS spikes have an initial peak\nat 0.7 ms leading the main spike. The larger size of the extracted spikes corresponding to\ntemplate 1 is correctly re\ufb02ected in the histogram of the recovered amplitudes. However the\nestimated spike shapes are inaccurate. The main di\ufb00erence is in the small peak preceding\nthe template 1. This is perhaps to be expected as the SD spike is always preceded in the\nraw data by a smaller IS spike. The expected templates were very similar (with a cosine of\n38\u2013 as estimated from the manually extracted spikes), making the task particularly di\ufb03cult.\n\n00.511.500.51Detection\u03c3N/\u03c3A00.511.500.51Weighted detection\u03c3N/\u03c3A00.511.500.51Misclassification\u03c3N/\u03c3A00.511.500.51False alarm\u03c3N/\u03c3A00.511.500.51R2 B\u03c3N/\u03c3A00.511.500.51R2 A\u03c3N/\u03c3A00.5100.51DetectionSimilarity00.5100.51Weighted detectionSimilarity00.5100.51MisclassificationSimilarity00.5100.51False alarmSimilarity00.5100.51R2 BSimilarity00.5100.51R2 ASimilarity\fFigure 3: Experimental results on extracellular recordings. Top left: reconstructed waveform\n(blue) and residual between the original data and the reconstructed waveform (red). Top\nright: templates B estimated manually from the data. Bottom left: estimated templates\nB. Bottom right: distribution of estimated amplitudes A. The SD spikes (blue) generally\noccur with larger amplitudes than the IS spikes (red).\n\n4.3 Implementation details\n\nAs with the original NMF and semi-NMF algorithms, the present algorithm is only locally\nconvergent. To obtain good solutions, we restart the algorithm several times with random\ninitializations for A (drawn independently from the uniform distribution in [0, 1]) and select\nthe solution with the maximum posterior likelihood or minimum cost (11). In addition to\nthese multiple restarts, we use a few heuristics that are motivated by the desired result\nof spike detection. We can thus prevent the algorithm from converging to some obviously\nsuboptimal solutions:\nRe-centering the templates: We noticed that local minima with poor performance typ-\nically occurred when the templates B were not centered within the L lags. In those cases\nthe main peaks could be adjusted to \ufb01t the data, but the portion of the template that\nextends outside the window of L samples could not be adjusted. To prune these suboptimal\nsolutions, it was su\ufb03cient to center the templates during the updates while shifting the\namplitudes accordingly.\nPruning events: We observed that spikes tended to generate non-zero amplitudes in A\nin clusters of 1 to 3 samples. After convergence we compact these to pulses of 1-sample\nduration located at the center of these clusters. Spike amplitude was preserved by scaling\nthe pulse amplitudes to match the sum of amplitudes in the cluster.\nRe-training with a less conservative sparseness constraint: To ensure that templates\nB are not a\ufb00ected by noise we initially train the algorithm with a strong penalty term (large\n\u03b2 e\ufb00ectively assuming strong noise power \u03c32\nN ). Only spikes with large amplitudes remain\nafter convergence and the templates are determined by only those strong spikes that have\nhigh SNR. After extracting templates accurately, we retrain the model amplitudes A while\nkeeping the templates B \ufb01xed assuming now a weaker noise power (smaller \u03b2).\nAs a result of these steps, the algorithm converged frequently to good solutions (approxi-\nmately 50 % of the time on the simulated data). The performance reported here represents\nthe results with minimum error after 6 random restarts.\n\n5 Discussion and outlook\n\nAlternative models: The present 1D formulation of the problem is similar to that of\nMorten M\u00f8rup [4] who presented a 2D version of this model that is limited to non-negative\ntemplates. We have also derived a version of the model with observations X arranged as\na matrix, as well as a version in which event timing is encoded explicitly as time delays \u03c4n\n\n00.10.20.30.40.5\u22121012Time (s)Reconstructed waveform and residual  00.6251.251.8752.5Manually constructed templates BmVTime (ms)  00.6251.251.8752.5Estimated templates BmVTime (ms)  0.050.10.150.201020AmplitudeFrequencyDistribution of the amplitudes A  ReconstructedResidual1 SD2 IS1212\ffollowing [8]. We are omitting these alternative formulations here for the sake of brevity.\nAlternative priors: In addition to the generalized Gaussian prior, we tested also Gaussian\nprocess priors [9] to encourage orthogonality between the k sequences and refractoriness in\ntime. However, we found that the quadratic expression of a Gaussian process competed\nwith the L\u03b1 sparseness term. In the future, we intend to combine both criteria by allowing\nfor correlations in the generalized Gaussian. The corresponding distributions are known\nas elliptically symmetric densities [10] and the corresponding process is called a spherically\ninvariant random processes, e.g., [11].\nSparseness and dimensionality reduction: As with many linear decomposition meth-\nods, a key feature of the algorithm is to represent the data within a small linear subspace.\nThis is particularly true for the semi-NMF algorithm since, provided a su\ufb03ciently large K\nand without enforcing a sparsity constraint, the positivity constraint on A actually amounts\nto no constraint at all (identical templates with opposite sign can accomplish the same\nas allowing negative A). For instance, without sparseness constraint on the amplitudes, a\ntrivial solution in our examples above would be a template B1l with a single positive spike\nsomewhere and another template B2l with a single negative spike, and all the time course\nencoded in An1 and An2.\nMISO identi\ufb01cation: The identi\ufb01ability problem is compounded by the fact that the\nestimation of templates B in this present formulation represents a multiple-input single-\noutput (MISO) system identi\ufb01cation problem. In the general case, MISO identi\ufb01cation is\nknown to be under-determined [12]. In the present case, the ambiguities of MISO identi-\n\ufb01cation may be limited due to the fact that we allow only for limited system length L as\ncompared to the number of samples N. Essentially, as the number of examples increases\nwith increasing length of the signal X, the ambiguity in B is reduced.\nThese issues will be adressed in future work.\n\nReferences\n\n[1] S. Mallat and Z. Zhang, \u201cMatching pursuit with time-frequency dictionnaries,\u201d IEEE Trans.\n\nSignal Process., vol. 41, pp. 3397\u20133415, 1993.\n\n[2] C. Ding, T. Li, and M. I. Jordan, \u201cConvex and semi-nonnegative matrix factorization for\nclustering and low-dimension representation,\u201d Lawrence Berkeley National Laboratory, Tech.\nRep. LBNL-60428, 2006.\n\n[3] T. Li and C. Ding, \u201cThe relationships among various nonnegative matrix factorization methods\n\nfor clustering,\u201d in Proc. ICDM, 2006, pp. 362\u2013371.\n\n[4] M. M\u00f8rup, M. N. Schmidt, and L. K. Hansen, \u201cShift invariant sparse coding of image and\n\nmusic data,\u201d Technical University of Denmark, Tech. Rep. IMM2008-04659, 2008.\n\n[5] H. Kameoka, N. Ono, K. Kashino, and S. Sagayama, \u201cComplex NMF: A new sparse represen-\n\ntation for acoustic signals,\u201d in Proc. ICASSP, Apr. 2009.\n\n[6] P. X. Joris, L. H. Carney, P. H. Smith, and T. C. T. Yin, \u201cEnhancement of neural syn-\nchronization in the anteroventral cochlear nucleus. I. Responses to tones at the characteristic\nfrequency,\u201d J. Neurophysiol., vol. 71, pp. 1022\u20131036, 1994.\n\n[7] S. Arkadiusz, M. Sayles, and I. M. Winter, \u201cSpike waveforms in the anteroventral cochlear\n\nnucleus revisited,\u201d in ARO midwinter meeting, no. Abstract #678, 2008.\n\n[8] M. M\u00f8rup, K. H. Madsen, and L. K. Hansen, \u201cShifted non-negative matrix factorization,\u201d in\n\nProc. MLSP, 2007, pp. 139\u2013144.\n\n[9] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, ser. Adap-\n\ntive Computation and Machine Learning. Cambridge, MA: The MIT Press, Jan. 2006.\n\n[10] K. Fang, S. Kotz, and K. Ng, Symmetric Multivariate and Related Distributions. London:\n\nChapman and Hall, 1990.\n\n[11] M. Rangaswamy, D. Weiner, and A. Oeztuerk, \u201cNon-Gaussian random vector identi\ufb01cation\nusing spherically invariant random processes,\u201d IEEE Trans. Aerospace and Electronic Systems,\nvol. 29, no. 1, pp. 111\u2013123, Jan. 1993.\n\n[12] J. Benesty, J. Chen, and Y. Huang, Microphone Array Signal Processing. Berlin, Germany:\n\nSpringer-Verlag, 2008.\n\n\f", "award": [], "sourceid": 740, "authors": [{"given_name": "Jonathan", "family_name": "Roux", "institution": null}, {"given_name": "Alain", "family_name": "Cheveign\u00e9", "institution": null}, {"given_name": "Lucas", "family_name": "Parra", "institution": null}]}