{"title": "Learning Multi-level Sparse Representations", "book": "Advances in Neural Information Processing Systems", "page_first": 818, "page_last": 826, "abstract": "Bilinear approximation of a matrix is a powerful paradigm of unsupervised learning. In some applications, however, there is a natural hierarchy of concepts that ought to be reflected in the unsupervised analysis. For example, in the neurosciences image sequence considered here, there are the semantic concepts of pixel $\\rightarrow$ neuron $\\rightarrow$ assembly that should find their counterpart in the unsupervised analysis. Driven by this concrete problem, we propose a decomposition of the matrix of observations into a product of more than two sparse matrices, with the rank decreasing from lower to higher levels. In contrast to prior work, we allow for both hierarchical and heterarchical relations of lower-level to higher-level concepts. In addition, we learn the nature of these relations rather than imposing them. Finally, we describe an optimization scheme that allows to optimize the decomposition over all levels jointly, rather than in a greedy level-by-level fashion.   The proposed bilevel SHMF (sparse heterarchical matrix factorization) is the first formalism that allows to simultaneously interpret a calcium imaging sequence in terms of the constituent neurons, their membership in assemblies, and the time courses of both neurons and assemblies.  Experiments show that the proposed model fully recovers the structure from difficult synthetic data designed to imitate the experimental data. More importantly, bilevel SHMF yields plausible interpretations of real-world Calcium imaging data.", "full_text": "Learning Multi-level Sparse Representations\n\nFerran Diego\n\nFred A. Hamprecht\n\nHeidelberg Collaboratory for Image Processing (HCI)\nInterdisciplinary Center for Scienti\ufb01c Computing (IWR)\nUniversity of Heidelberg, Heidelberg 69115, Germany\n\n{ferran.diego,fred.hamprecht}@iwr.uni-heidelberg.de\n\nAbstract\n\nBilinear approximation of a matrix is a powerful paradigm of unsupervised learn-\ning. In some applications, however, there is a natural hierarchy of concepts that\nought to be re\ufb02ected in the unsupervised analysis. For example, in the neuro-\nsciences image sequence considered here, there are the semantic concepts of pixel\n\u2192 neuron \u2192 assembly that should \ufb01nd their counterpart in the unsupervised anal-\nysis. Driven by this concrete problem, we propose a decomposition of the matrix\nof observations into a product of more than two sparse matrices, with the rank de-\ncreasing from lower to higher levels. In contrast to prior work, we allow for both\nhierarchical and heterarchical relations of lower-level to higher-level concepts. In\naddition, we learn the nature of these relations rather than imposing them. Finally,\nwe describe an optimization scheme that allows to optimize the decomposition\nover all levels jointly, rather than in a greedy level-by-level fashion.\nThe proposed bilevel SHMF (sparse heterarchical matrix factorization) is the \ufb01rst\nformalism that allows to simultaneously interpret a calcium imaging sequence in\nterms of the constituent neurons, their membership in assemblies, and the time\ncourses of both neurons and assemblies. Experiments show that the proposed\nmodel fully recovers the structure from dif\ufb01cult synthetic data designed to imitate\nthe experimental data. More importantly, bilevel SHMF yields plausible interpre-\ntations of real-world Calcium imaging data.\n\n1\n\nIntroduction\n\nThis work was stimulated by a concrete problem, namely the decomposition of state-of-the-art 2D +\ntime calcium imaging sequences as shown in Fig. 1 into neurons, and assemblies of neurons [20].\nCalcium imaging is an increasingly popular tool for unraveling the network structure of local circuits\nof the brain [11, 6, 7]. Leveraging sparsity constraints seems natural, given that the neural activations\nare sparse in both space and time. The experimentally achievable optical slice thickness still results\nin spatial overlap of cells, meaning that each pixel can show intensity from more than one neuron.\nIn addition, it is anticipated that one neuron can be part of more than one assembly. All neurons of\nan assembly are expected to \ufb01re at roughly the same time [20].\nA standard sparse decomposition of the set of vectorized images into a dictionary and a set of\ncoef\ufb01cients would not conform with prior knowledge that we have entities at three levels: the pixels,\nthe neurons, and the assemblies, see Fig. 2. Also, it would not allow to include structured constraints\n[10] in a meaningful way. As a consequence, we propose a multi-level decomposition (Fig. 3) that\n\n\u2022 allows enforcing (structured) sparsity constraints at each level,\n\u2022 admits both hierarchical or heterarchical relations between levels (Fig. 2),\n\u2022 can be learned jointly (section 2 and 2.4), and\n\u2022 yields good results on real-world experimental data (Fig. 2).\n\n1\n\n\fFigure 1: Left: frames from a calcium imaging sequence showing \ufb01ring neurons that were recorded\nby an epi-\ufb02uorescence microscope. Right: two frames from a synthetic sequence. The underlying\nbiological aim motivating these experiments is to study the role of neuronal assemblies in memory\nconsolidation.\n\n1.1 Relation to Previous Work\n\nMost important unsupervised data analysis methods such as PCA, NMF / pLSA, ICA, cluster analy-\nsis, sparse coding and others can be written in terms of a bilinear decomposition of, or approximation\nto, a two-way matrix of raw data [22]. One natural generalization is to perform multilinear decompo-\nsitions of multi-way arrays [4] using methods such as higher-order SVD [1]. This is not the direction\npursued here, because the image sequence considered does not have a tensorial structure.\nOn the other hand, there is a relation to (hierarchical) topic models (e.g. [8]). These do not use struc-\ntured sparsity constraints, but go beyond our approach in automatically estimating the appropriate\nnumber of levels using nonparametric Bayesian models.\nClosest to our proposal are four lines of work that we build on: Jenatton et al. [10] introduce struc-\ntured sparsity constraints that we use to \ufb01nd dictionary basis functions representing single neurons.\nThe works [9] and [13] enforce hierarchical (tree-structured) sparsity constraints. These authors \ufb01nd\nthe tree structure using extraneous methods, such as a separate clustering procedure. In contrast, the\nmethod proposed here can infer either hierarchical (tree-structured) or heterarchical (directed acyclic\ngraph) relations between entities at different levels. Cichocki and Zdunek [3] proposed a multilayer\napproach to non-negative matrix factorization. This is a multi-stage procedure which iteratively de-\ncomposes the rightmost matrix of the decomposition that was previously found. Similar approaches\nare explored in [23], [24]. Finally, Rubinstein et al. [21] proposed a novel dictionary structure\nwhere each basis function in a dictionary is a linear combination of a few elements from a \ufb01xed base\ndictionary. In contrast to these last two methods, we optimize over all factors (including the base\ndictionary) jointly. Note that our semantics of \u201cbilevel factorization\u201d (section 2.2) are different from\nthe one in [25].\nNotation. A matrix is a set of columns and rows, respectively, X = [x:1, . . . , x:n] = [x1:; . . . ; xm:].\nThe zero matrix or vector is denoted 0, with dimensions inferred from the context. For any vector\ni=1 |xi|\u03b1)1/\u03b1 is the l\u03b1 (quasi)-norm of x, and (cid:107) \u00b7 (cid:107)F is the Frobenius norm.\n\nx \u2208 Rm, (cid:107)x(cid:107)\u03b1 = ((cid:80)m\n\n2 Learning a Sparse Heterarchical Structure\n\n2.1 Dictionary Learning: Single Level Sparse Matrix Factorization\nLet X \u2208 Rm\u00d7n be a matrix whose n columns represent an m-dimensional observation each. The\n\nidea of dictionary learning is to \ufb01nd a decomposition X \u2248 D(cid:2)U0(cid:3)T , see Fig. 3(a). D is called\n\nthe dictionary, and its columns hold the basis functions in terms of which the sparse coef\ufb01cients in\nU0 approximate the original observations. The regularization term \u2126U encourages sparsity of the\ncoef\ufb01cient matrix. \u2126D prevents the in\ufb02ation of dictionary entries to compensate for small coef\ufb01-\ncients, and induces, if desired, additional structure on the learned basis functions [16]. Interesting\ntheoretical results on support recovery, furthered by an elegantly compact formulation and the ready\navailability of optimizers [17] have spawned a large number of intriguing and successful applica-\ntions, e.g. image denoising [19] and detection of unusual events [26]. Dictionary learning is a special\ninstance of our framework, involving only a single-level decomposition. In the following we \ufb01rst\ngeneralize to two, then to more levels.\n\n2\n\n\fFigure 2: Bottom left: Shown are the temporal activation patterns of individual neurons U0 (lower\nlevel), and assemblies of neurons U1 (upper level). Neurons D and assemblies are related by a\nbipartite graph A1 the estimation of which is a central goal of this work. The signature of \ufb01ve\nneuronal assemblies (\ufb01ve columns of DA1) in the spatial domain is shown at the top. The outlines in\nthe middle of the bottom show the union of all neurons found in D, superimposed onto a maximum\nintensity projection across the background-subtracted raw image sequence. The graphs on the right\nshow a different view on the transients estimated for single neurons, that is, the rows of U0. The raw\ndata comes from a mouse hippocampal slice, where single neurons can indeed be part of more than\none assembly [20]. Analogous results on synthetic data are shown in the supplemental material.\n\na)\n\nb)\n\nc)\n\nd)\n\nFigure 3: Decomposition of X into {1, 2, 3, L + 1} levels, with corresponding equations.\n\n3\n\nheterarchical correspondenceid neuronid assembliestimes (frames)51015202530354045times (frames)\f2.2 Bilevel Sparse Matrix Factorization\n\nWe now come to the heart of this work. To build intuition, we \ufb01rst refer to the application that has\nmotivated this development, before giving mathematical details. The relation between the symbols\nused in the following is sketched in Fig. 3(b), while actual matrix contents are partially visualized\nin Fig. 2.\nGiven is a sequence of n noisy sparse images which we vectorize and collect in the columns of\nmatrix X. We would like to \ufb01nd the following:\n\nfunction should correspond to a single neuron.\n\n\u2022 a dictionary D of q0 vectorized images comprising m pixels each.\nIdeally, each basis\n\u2022 a matrix A1 indicating to what extent each of the q0 neurons is associated with any of the\nq1 neuronal assemblies. We will call this matrix interchangeably assignment or adjacency\nmatrix in the following. It is this matrix which encapsulates the quintessential structure\nwe extract from the raw data, viz., which lower-level concept is associated with which\nhigher-level concept.\n\u2022 a coef\ufb01cient matrix [U1]T that encodes in its rows the temporal evolution (activation) of\n\u2022 a coef\ufb01cient matrix [U0]T (shown in the equation, but not in the sketch of Fig. 3(b)) that\nencodes in its rows the temporal activation of the q0 neuron basis functions across n time\nsteps.\n\nthe q1 neuronal assemblies across n time steps.\n\nThe quantities D, A1, [U0], [U1] in this redundant representation need to be consistent.\nLet us now turn to equations. At \ufb01rst sight, it seems like minimizing (cid:107)X \u2212 DA1[U1]T(cid:107)2\nF over\nD, A1, U1 subject to constraints should do the job. However, this could be too much of a simpli-\n\ufb01cation! To illustrate, assume for the moment that only a single neuronal assembly is active at any\ngiven time. Then all neurons associated with that assembly would follow an absolutely identical\ntime course. While it is expected that neurons from an assembly show similar activation patterns\n[20], this is something we want to glean from the data, and not absolutely impose. In response, we\nintroduce an auxiliary matrix U0 \u2248 U1[A1]T showing the temporal activation pattern of individual\nneurons. These two matrices, U0 and U1, are also shown in the false color plots of the collage of\nFig. 2, bottom left.\nThe full equation involving coef\ufb01cient and auxiliary coef\ufb01cient matrices is shown in Fig. 3(b). The\nterms involving X are data \ufb01delity terms, while (cid:107)U0\u2212U1[A1]T(cid:107)2\nF enforces consistency. Parameters\n\u03b7 trade off the various terms, and constraints of a different kind can be applied selectively to each\nof the matrices that we optimize over. Jointly optimizing over D, A1, U0, and U1 is a hard and\nnon-convex problem that we address using a block coordinate descent strategy described in section\n2.4 and supplemental material.\n\n2.3 Trilevel and Multi-level Sparse Matrix Factorization\n\nWe now discuss the generalization to an arbitrary number of levels that may be relevant for appli-\ncations other than calcium imaging. To give a better feeling for the structure of the equations, the\ntrilevel case is spelled out explicitly in Fig. 3(c), while Fig. 3(d) shows the general case of L + 1\nlevels.\nThe most interesting matrices, in many ways, are the assignment matrices A1, A2, etc. Assume,\n\ufb01rst, that the relations between lower-level and higher-level concepts obey a strict inclusion hier-\narchy. Such relations can be expressed in terms of a forest of trees: each highest-level concept is\nthe root of a tree which fans out to all subordinate concepts. Each subordinate concept has a single\nparent only. Such a forest can also be seen as a (special case of an L + 1-partite) graph, with an\nadjacency matrix Al specifying the parents of each concept at level l \u2212 1. To impose an inclusion\nhierarchy, one can enforce the nestedness condition by requiring that (cid:107)al\nIn general, and in the application considered here, one will not want to impose an inclusion hier-\narchy. In that case, the relations between concepts can be expressed in terms of a concatenation of\nbipartite graphs that conform with a directed acyclic graph. Again, the adjacency matrices encode\nthe structure of such a directed acyclic graph.\n\nk:(cid:107)0 \u2264 1.\n\n4\n\n\fIn summary, the general equation in Fig. 3(d) is a principled alternative to simpler approaches that\nwould impose the relations between concepts, or estimate them separately using, for instance, clus-\ntering algorithms; and that would then \ufb01nd a sparse factorization subject to this structure. Instead,\nwe simultaneously estimate the relation between concepts at different levels, as well as \ufb01nd a sparse\napproximation to the raw data.\n\n2.4 Optimization\n\nThe optimization problem in Fig. 3(d) is not jointly convex, but becomes convex w.r.t. one variable\nwhile keeping the others \ufb01xed provided that the norms \u2126U , \u2126D, and \u2126A are also convex. Indeed,\nit is possible to de\ufb01ne convex norms that not only induce sparse solutions, but also favor non-zero\npatterns of a speci\ufb01c structure, such as sets of variables in a convex polygon with certain symmetry\nconstraints [10]. Following [5], we use such norms to bias towards neuron basis functions holding a\nsingle neuron only. We employ a block coordinate descent strategy [2, Section 2.7] that iteratively\noptimizes one group of variables while \ufb01xing all others. Due to space limitations, the details and\nimplementation of the optimization are described in the supplemental material.\n\n3 Methods\n\n3.1 Decomposition into neurons and their transients only\n\nCell Sorting [18] and Adina [5]\nfocus only on the detection of cell centroids and of cell shape,\nand the estimation and analysis of Calcium transient signals. However, these methods provide no\nmeans to detect and identify neuronal co-activation. The key idea is to decompose calcium imaging\ndata into constituent signal sources, i.e. temporal and spatial components. Cell sorting combines\nprincipal component analysis (PCA) and independent component analysis (ICA). In contrast, Adina\nrelies on a matrix factorization based on sparse coding and dictionary learning [15], exploiting that\nneuronal activity is sparsely distributed in both space and time. Both methods are combined with a\nsubsequent image segmentation since the spatial components (basis functions) often contain more\nthan one neuron. Without such a segmentation step, overlapping cells or those with highly correlated\nactivity are often associated with the same basis function.\n\n3.2 Decomposition into neurons, their transients, and assemblies of neurons\n\nMNNMF+Adina Here, we combine a multilayer extension of non-negative matrix factorization\nwith the segmentation from Adina. MNNMF [3] is a multi-stage procedure that iteratively decom-\nposes the rightmost matrix of the decomposition that was previously found. In the \ufb01rst stage, we\ndecompose the calcium imaging data into spatial and temporal components, just like the methods\ncited above, but using NMF and a non-negative least squares loss function [12] as implemented in\n[14]. We then use the segmentation from [5] to obtain single neurons in an updated dictionary1\nD. Given this purged dictionary, the temporal components U0 are updated under the NMF cri-\nterion. Next, the temporal components U0 are further decomposed into two low-rank matrices,\nU0 \u2248 U1[A1]T , again using NMF. Altogether, this procedure allows identifying neuronal assem-\nblies and their temporal evolution. However, the exact number of assemblies q1 must be de\ufb01ned a\npriori.\nKSVDS+Adina allows estimating a sparse decomposition [21] X \u2248 DA1[U1]T provided that\ni) a dictionary of basis functions and ii) the exact number of assemblies is supplied as input. In\naddition, the assignment matrix A1 is typically dense and needs to be thresholded. We obtain good\nresults when supplying the purged dictionary1 of single neurons resulting from Adina [5].\n\nSHMF \u2013 Sparse Heterarchical Matrix Factorization\nin its bilevel formulation decomposes the\nraw data simultaneously into neuron basis functions D, a mapping of these to assemblies A1, as\nwell as time courses of neurons U0 and assemblies U1, see equation in Fig. 3. Sparsity is induced\nby setting \u2126U and \u2126A to the l1-norm. In addition, we impose the l2-norm at the assembly level \u21261\nD,\n1Without such a segmentation step, the dictionary atoms often comprise more than one neuron, and overall\n\nresults (not shown) are poor.\n\n5\n\n\fand let \u2126D be the structured sparsity-inducing norm proposed by Jenatton et al. [10]. In contrast to\nall other approaches described above, this already suf\ufb01ces to produce basis functions that contain,\nin most cases, only single neurons. Exceptions arise only in the case of cells which both overlap in\nspace and have high temporal correlation. For this reason, and for a fair comparison with the other\nmethods, we again use the segmentation from [5]. For the optimization, D and U0 are initialized\nwith the results from Adina. U1 is initialized randomly with positive-truncated Gaussian noise,\nand A1 by the identity matrix as in KSVDS [21]. Finally, the number of neurons q0 and neuronal\nassemblies q1 are set to generous upper bounds of the expected true numbers, and are both set to\nequal values (here: q0 = q1 = 60) for simplicity. Note that a precise speci\ufb01cation as for the above\nmethods is not required.\n\n4 Results\n\nTo obtain quantitative results, we \ufb01rst evaluate the proposed methods on synthetic image sequences\ndesigned so as to exhibit similar characteristics as the real data. We also report a qualitative analysis\nof the performance on real data from [20]. Since neuronal assemblies are still the subject of ongoing\nresearch, ground truth is not available for such real-world data.\n\n4.1 Arti\ufb01cal Sequences\nFor evaluation, we created 80 synthetic sequences with 450 frames of size 128 \u00d7 128 pixels with a\nframe rate of 30f ps. The data is created by randomly selecting cell shapes from 36 different active\ncells extracted from real data, and locating them in different locations with an overlap of up to 30%.\nEach cell is randomly assigned to up to three out of a total of \ufb01ve assemblies. Each assembly \ufb01res\naccording to a dependent Poisson process, with transient shapes following a one-sided exponential\ndecay with a scale of 500 to 800ms that is convolved by a Gaussian kernel with \u03c3 = 50ms. The\ndependency is induced by eliminating all transients that overlap by more than 20%. Within such a\ntransient, the neurons associated with the assembly \ufb01re with a probability of 90% each. The number\nof cells per assembly varies from 1 to 10, and we use \ufb01ve assemblies in all experiments. Finally,\nthe synthetic movies are distorted by white Gaussian noise with a relative amplitude, (max. intensity\n\u2212 mean intensity)/\u03c3noise \u2208 {3, 5, 7, 10, 12, 15, 17, 20}. By construction, the identity, location and\nactivity patterns of all cells along with their membership in assemblies are known. The supplemental\nmaterial shows one example, and two frames are shown in Fig. 1.\n\nIdenti\ufb01caton of assemblies First, we want to quantify the ability to correctly infer assemblies\nfrom an image sequence. To that end, we compute the graph edit distance of the estimated assign-\nments of neurons to assemblies, encoded in matrices A1, to the known ground truth. We count the\nnumber of false positive and false negative edges in the assignment graphs, where vertices (assem-\nblies) are matched by minimizing the Hamming distance between binarized assignment matrices\nover all permutations.\nRemember that MNNMF+Adina and KSVDS+Adina require a speci\ufb01cation of the precise number\nof assemblies, which is unknown for real data. Accordingly, adjacency matrices, A1 \u2208 Rq0\u00d7q1 for\ndifferent values for the number of assemblies q1 \u2208 [3, 7] were estimated. Bilevel SHMF only needs\nan upper bound on the number of assemblies. Its performance is independent of the precise value,\nbut computational cost increases with the bound. In these experiments, q1 was set to 60.\nFig. 4 shows that all methods from section 3.2 give respectable performance in the task of inferring\nneuronal assemblies from nontrivial synthetic image sequences. For the true number of assemblies\n(q1 = 5), Bilevel SHMF reaches a higher sensitivity than the alternative methods, with a median\ndifference of 14%. According to the quartiles, the precisions achieved are broadly comparable, with\nMNNMF+Adina reaching the highest value.\nAll methods from section 3.2 also infer the temporal activity of all assemblies, U1. We omit a\ncomparison of these matrices for lack of a good metric that would also take into account the correct-\nness of the assemblies themselves: a \ufb01ne time course has little worth if its associated assembly is\nde\ufb01cient, for instance by having lost some neurons with respect to ground truth.\n\n6\n\n\fSensitivity\n\nPrecision\n\nFigure 4: Performance on learning correct assignments of neurons to assemblies from nontrivial\nsynthetic data with ground truth. KSVDS+Adina and MNNMF+Adina require that the number of\nassemblies q1 be \ufb01xed in advance. In contrast, bilevel SHMF estimates the number of assemblies\ngiven an upper-bound. Its performance is hence shown as a constant over the q1-axis. Plots show\nthe median as well as the band between the lower and the upper quartile for all 80 sequences. Colors\nat non-integer q1-values are a guide to the eye.\n\nDetection of calcium transients While the detection of assemblies as evaluated above is com-\npletely new in the literature, we now turn to a better studied [18, 5] problem:\nthe detection of\ncalcium transients of individual neurons. Some estimates for these characteristic waveforms are\nalso shown, for real-world data, on the right hand side of Fig. 2.\nTo quantify transient detection performance, we compute the sensitivity and precision as in [20].\nHere, sensitivity is the ratio of correctly detected to all neuronal activities; and precision is the ratio\nof correctly detected to all detected neuronal activities. Results are shown in Fig. 5.\n\nFigure 5: Sensitivity and precision of transient detection for individual neurons. Methods that\nestimate both assemblies and neuron transients perform at least as well as their simpler counterparts\nthat focus on the latter.\n\nPerhaps surprisingly, the methods from section 3.2 (MNNMF+Adina and Bilevel SHMF2) fare at\nleast as well as those from section 3.1 (CellSorting and Adina). This is not self-evident, because a\nbilevel factorization could be expected to be more ill-posed than a single level factorization.\nWe make two observations: Firstly, it seems that using a bilevel representation with suitable regular-\nization constraints helps stabilize the activity estimates also for single neurons. Secondly, the higher\nsensitivity and similar precision of bilevel SHMF compared to MNNMF+Adina suggest that a joint\nestimation of neurons, assemblies and their temporal activities as described in section 2 increases the\nrobustness, and compensates errors that may not be corrected in greedy level-per-level estimation.\nIncidentally, the great spread of both sensitivities and precisions results from the great variety of\nnoise levels used in the simulations, and attests to the dif\ufb01culty of part of the synthetic data sets.\n\n2KSVDS is not evaluated here because it does not yield activity estimates for individual neurons.\n\n7\n\n\fRaw data\n\nCell Sorting [18]\n\nAdina [5]\n\nNeurons\n(D[U0]T )\n\nAssemblies\n(DA1[U1]T )\n\nFigure 6: Three examples of raw data and reconstructed images of the times indicated in Fig. 2. The\nother examples are shown in the supplemental material.\n\n4.2 Real Sequences\n\nWe have applied bilevel SHMF to epi\ufb02uorescent data sets from mice (C57BL6) hippocampal slice\ncultures. As shown in Fig. 2, the method is able to distinguish overlapping cells and highly correlated\ncells, while at the same time estimating neuronal co-activation patterns (assemblies). Exploiting\nspatio-temporal sparsity and convex cell shape priors allows to accurately infer the transient events.\n\n5 Discussion\n\nThe proposed multi-level sparse factorization essentially combines a clustering of concepts across\nseveral levels (expressed by the assignment matrices) with the \ufb01nding of a basis dictionary, shared\nby concepts at all levels, and the \ufb01nding of coef\ufb01cient matrices for different levels. The formalism\nallows imposing different regularizers at different levels. Users need to choose tradeoff parameters\n\u03b7, \u03bb that indirectly determine the number of concepts (clusters) found at each level, and the sparsity.\nThe ranks ql, on the other hand, are less important: Figure 2 shows that the ranks of estimated\nmatrices can be lower than their nominal dimensionality: super\ufb02uous degrees of freedom are simply\nnot used.\nOn the application side, the proposed method allows to accomplish the detection of neurons, as-\nsemblies and their relation in a single framework, exploiting sparseness in the temporal and spatial\ndomain in the process. Bilevel SHMF in particular is able to detect automatically, and differenti-\nate between, overlapping and highly correlated cells, and to estimate the underlying co-activation\npatterns. As shown in Fig. 6, this approach is able to reconstruct the raw data at both levels of\nrepresentations, and to make plausible proposals for neuron and assembly identi\ufb01cation.\nGiven the experimental importance of calcium imaging, automated methods in the spirit of the\none described here can be expected to become an essential tool for the investigation of complex\nactivation patterns in live neural tissue.\n\nAcknowledgement\n\nWe are very grateful for partial \ufb01nancial support by CellNetworks Cluster (EXC81). We also thank\nSusanne Reichinnek, Martin Both and Andreas Draguhn for their comments on the manuscript.\n\n8\n\n\fReferences\n[1] G. Bergqvist and E. G. Larsson. The Higher-Order Singular Value Decomposition Theory and an Appli-\n\ncation. IEEE Signal Processing Magazine, 27(3):151\u2013154, 2010.\n\n[2] D. P. Bertsekas. Nonlinear Programming. Athena Scienti\ufb01c, 1999.\n[3] A. Cichocki and R. Zdunek. Multilayer nonnegative matrix factorization. Electronics Letters, 42:947\u2013\n\n948, 2006.\n\n[4] A. Cichocki, R. Zdunek, A. H. Phan, and S. Amari. Nonnegative Matrix and Tensor Factorizations -\n\nApplications to Exploratory Multi-way Data Analysis and Blind Source Separation. Wiley, 2009.\n\n[5] F. Diego, S. Reichinnek, M. Both, and F. A. Hamprecht. Automated identi\ufb01cation of neuronal activity\nfrom calcium imaging by sparse dictionary learning. In International Symposium on Biomedical Imaging,\nin press, 2013.\n\n[6] W. Goebel and F. Helmchen. In vivo calcium imaging of neural network function. Physiology, 2007.\n[7] C. Grienberger and A. Konnerth. Imaging calcium in neurons. Neuron, 2011.\n[8] Q. Ho, J. Eisenstein, and E. P. Xing. Document hierarchies from text and links. In Proc. of the 21st Int.\n\nWorld Wide Web Conference (WWW 2012), pages 739\u2013748. ACM, 2012.\n\n[9] R. Jenatton, A. Gramfort, V. Michel, G. Obozinski, E. Eger, F. Bach, and B. Thirion. Multi-scale Mining\n\nof fMRI data with Hierarchical Structured Sparsity. SIAM Journal on Imaging Sciences, 5(3), 2012.\n\n[10] R. Jenatton, G. Obozinski, and F. Bach. Structured sparse principal component analysis. In International\n\nConference on Arti\ufb01cial Intelligence and Statistics (AISTATS), 2010.\n\n[11] J. Kerr and W. Denk. Imaging in vivo: watching the brain in action. Nature Review Neuroscience, 2008.\n[12] H. Kim and H. Park. Nonnegative matrix factorization based on alternating nonnegativity constrained\n\nleast squares and active set method. SIAM J. on Matrix Analysis and Applications, 2008.\n\n[13] S. Kim and E. P. Xing. Tree-guided group lasso for multi-response regression with structured sparsity,\n\nwith an application to eQTL mapping. Ann. Appl. Stat., 2012.\n\n[14] Y. Li and A. Ngom. The non-negative matrix factorization toolbox for biological data mining. In BMC\n\nSource Code for Biology and Medicine, 2013.\n\n[15] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding. In Proceedings\n\nof the 26th Annual International Conference on Machine Learning, 2009.\n\n[16] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online Learning for Matrix Factorization and Sparse Coding.\n\nJournal of Machine Learning Research, 2010.\n\n[17] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and R. Jenatton. Sparse modeling software. http://spams-\n\ndevel.gforge.inria.fr/.\n\n[18] E. A. Mukamel, A. Nimmerjahn, and M. J. Schnitzer. Automated analysis of cellular signals from large-\n\nscale calcium imaging data. Neuron, 2009.\n\n[19] M. Protter and M. Elad.\n\nImage sequence denoising via sparse and redundant representations.\n\nTransactions on Image Processing, 18(1), 2009.\n\nIEEE\n\n[20] S. Reichinnek, A. von Kameke, A. M. Hagenston, E. Freitag, F. C. Roth, H. Bading, M. T. Hasan,\nA. Draguhn, and M. Both. Reliable optical detection of coherent neuronal activity in fast oscillating\nnetworks in vitro. NeuroImage, 60(1), 2012.\n\n[21] R. Rubinstein, M. Zibulevsky, and M. Elad. Double sparsity: Learning sparse dictionaries for sparse\n\nsignal approximation. IEEE Transactions on Signal Processing, 2010.\n\n[22] A. P. Singh and G. J. Gordon. A uni\ufb01ed view of matrix factorization models. ECML PKDD, 2008.\n[23] M. Sun and H. Van Hamme. A two-layer non-negative matrix factorization model for vocabulary discov-\n\nery. In Symposium on machine learning in speech and language processing, 2011.\n\n[24] Q. Sun, P. Wu, Y. Wu, M. Guo, and J. Lu. Unsupervised multi-level non-negative matrix factorization\n\nmodel: Binary data case. Journal of Information Security, 2012.\n\n[25] J. Yang, Z. Wang, Z. Lin, X. Shu, and T. S. Huang. Bilevel sparse coding for coupled feature spaces. In\n\nCVPR\u201912, pages 2360\u20132367. IEEE, 2012.\n\n[26] B. Zhao, L. Fei-Fei, and E. P. Xing. Online detection of unusual events in videos via dynamic sparse\ncoding. In The Twenty-Fourth IEEE Conference on Computer Vision and Pattern Recognition, Colorado\nSprings, CO, June 2011.\n\n9\n\n\f", "award": [], "sourceid": 470, "authors": [{"given_name": "Ferran", "family_name": "Diego Andilla", "institution": "University of Heidelberg"}, {"given_name": "Fred", "family_name": "Hamprecht", "institution": "University of Heidelberg"}]}