{"title": "Using Helmholtz Machines to Analyze Multi-channel Neuronal Recordings", "book": "Advances in Neural Information Processing Systems", "page_first": 131, "page_last": 137, "abstract": null, "full_text": "Using Helmholtz Machines to analyze \n\nmulti-channel neuronal recordings \n\nVirginia R. de Sa \ndesa@phy.ucsf.edu \n\nR. Christopher deC harms \n\ndecharms@phy.ucsf.edu \n\nMichael M. Merzenich \n\nmerz@phy.ucsf.edu \n\nSloan Center for Theoretical Neurobiology and \nW. M. Keck Center for Integrative Neuroscience \nUniversity of California, San Francisco CA 94143 \n\nAbstract \n\nOne of the current challenges to understanding neural information \nprocessing in biological systems is to decipher the \"code\" carried \nby large populations of neurons acting in parallel. We present an \nalgorithm for automated discovery of stochastic firing patterns in \nlarge ensembles of neurons. The algorithm, from the \"Helmholtz \nMachine\" family, attempts to predict the observed spike patterns in \nthe data. The model consists of an observable layer which is directly \nactivated by the input spike patterns, and hidden units that are ac(cid:173)\ntivated through ascending connections from the input layer. The \nhidden unit activity can be propagated down to the observable layer \nto create a prediction of the data pattern that produced it. Hidden \nunits are added incrementally and their weights are adjusted to im(cid:173)\nprove the fit between the predictions and data, that is, to increase a \nbound on the probability of the data given the model. This greedy \nstrategy is not globally optimal but is computationally tractable for \nlarge populations of neurons. We show benchmark data on artifi(cid:173)\ncially constructed spike trains and promising early results on neuro(cid:173)\nphysiological data collected from our chronic multi-electrode cortical \nimplant. \n\n1 \n\nIntroduction \n\nUnderstanding neural processing will ultimately require observing the response \npatterns and interactions of large populations of neurons. While many studies \nhave demonstrated that neurons can show significant pairwise interactions, and \nthat these pairwise interactions can code stimulus information [Gray et aI., 1989, \nMeister et aI., 1995, deCharms and Merzenich, 1996, Vaadia et al., 1995], there is \ncurrently little understanding of how large ensembles of neurons might function to(cid:173)\ngether to represent stimuli. This situation has arisen partly out of the historical \n\n\f132 \n\nV R. d. Sa, R. C. deCharms and M. M. Merzenich \n\ndifficulty of recording from large numbers of neurons simultaneously. Now that this \nis becoming technically feasible, the remaining analytical challenge is to understand \nhow to decipher the information carried in distributed neuronal responses. \n\nExtracting information from the firing patterns in large neuronal populations is dif(cid:173)\nficult largely due to the combinatorial complexity of the problem, and the uncer(cid:173)\ntainty about how information may be encoded. There have been several attempts \nto look for higher order correlations [Martignon et al., 1997] or decipher the activity \nfrom multiple neurons, but existing methods are limited in the type of patterns they \ncan extract assuming absolute reliability of spikes within temporal patterns of small \nnumbers of neurons [Abeles, 1982, Abeles and Gerstein, 1988, Abeles et al., 1993, \nSchnitzer and Meister,] or considering only rate codes [Gat and Tishby, 1993, \nAbeles et al., 1995]. Given the large numbers of neurons involved in coding sensory \nevents and the high variability of cortical action potentials, we suspect that mean(cid:173)\ningful ensemble coding events may be statistically similar from instance to instance \nwhile not being identical. Searching for these type of stochastic patterns is a more \nchallenging task. \n\nOne way to extract the structure in a pattern dataset is to construct a generative \nmodel that produces representative data from hidden stochastic variables. Helmholtz \nmachines [Hinton et al., 1995, Dayan et al., 1995] efficiently [Frey et al., 1996] pro(cid:173)\nduce generative models of datasets by maximizing a lower bound on the log likelihood \nof the data. Cascaded Redundancy Reduction [de Sa and Hinton, 1998] is a partic(cid:173)\nularly simple form of Helmholtz machine in which hidden units are incrementally \nadded. As each unit is added, it greedily attempts to best model the data using all \nthe previous units. In this paper we describe how to apply the Cascaded Redun(cid:173)\ndancy Reduction algorithm to the problem of finding patterns in neuronal ensemble \ndata, test the performance of this method on artificial data, and apply the method \nto example neuronal spike trains. \n\n1.1 Cascaded Redundancy Reduction \n\nThe simplest form of generative model is to model each observed (or input) unit as \na stochastic binary random variable with generative bias bi. This generative input is \npassed through a transfer function to give a probability of firing. \n\np. = a(b\u00b7) = \n\n~ \n\n~ \n\n1 \n\n1 + e- bi \n\n(1) \n\nWhile this can model the individual firing rates of binary units, it cannot account \nfor correlations in firing between units. Correlations can be modeled by introducing \nhidden units with generative weights to the correlated observed units. By cascading \nhidden units as in Figure 1, we can represent higher order correlations. Lower units \nsum up their total generative input from higher units and their generative bias. \n\nXi = bi + L Sj9j,i \n\nj>i \n\n(2) \n\nFinding the optimal generative weights (9j,i, bi) for a given dataset involves an in(cid:173)\ntractable search through an exponential number of possible states of the hidden units. \nHelmholtz machines approximate this problem by using forward recognition connec(cid:173)\ntions to compute an approximate distribution over hidden states for each data pat(cid:173)\ntern. Cascaded Redundancy Reduction takes this approximation one step further by \napproximating the distribution by a single state. This makes the search for recogni(cid:173)\ntion and generative weights much simpler. Given a data vector, d, considering the \nstate produced by the recognition connections as Sd gives a lower bound on the log \n\n\fUsing Helmholtz Machines to Analyze Multi-channel Neuronal Recordings \n\n133 \n\n-:.> generative connections \n\n~ recognition connections \n\nTO,k \n\nT \ni,k \n\nFigure 1: The Cascaded Redundancy Reduction Network. Hidden units are added \nincrementally to help better model the data. \n\nlikelihood of the data. Units are added incrementally with the goal of maximizing \nthis lower bound, C, \nC = 'L[(s% log a(b k ) + (l-s%) log(l-a(bk))+ L sf log a(xf)+ (I-sf) log(l-a(xf))] \n\nd \n\ni \n\n(3) \nBefore a unit is added it is considered as a temporary addition. Once its weights \nhave been learned, it is added to the permanent network only if adding it reduces \nthe cost on an independent validation set from the same data distribution. This is \nto prevent overtraining. While a unit is considered for addition, all weights other \nthan those to and from the new unit and the generative bias weights are fixed. The \nlearning of the weights to and from the new unit is then a fairly simple optimization \nproblem involving treating the unit as stochastic, and performing gradient descent \non the resulting modified lower bound. \n\n2 Method \n\nThis generic pattern finding algorithm can be applied to multi-unit spike trains by \ntreating time as another spatial dimension as is often done for time series data. The \nspikes are binned on the order of a few to tens of milliseconds and the algorithm looks \nfor patterns in finite time length windows by sliding a window centered on each spike \nfrom a chosen trigger channel. An example extracted window using channel 4 as the \ntrigger channel is shown in Figure 2. \nBecause the number of spikes can be larger than one, the observed units (bins) are \nmodeled as discrete Poisson random variables rather than binary random variables \n(the hidden units are still kept as binary units). To reflect the constraint that the \nexpected number of spikes cannot be negative but may be larger than one, the transfer \nfunction for these observed bins was chosen to be exponential. Thus if Xi is the \ntotal summed generative input, Ai, the expected mean number of spikes in bin i, is \ncalculated as eX; and the probability of finding s spikes in that bin is given by \n\ns! \n\n(4) \n\n\f134 \n\nV.R. d. Sa, R. C. deChanns and M. M. Merzenich \n\nFigure 2: The input patterns for the algorithm are windows from the full spatia(cid:173)\ntemporal firing patterns. The full dataset is windows centered about every spike in \nthe trigger channel. \n\nThe terms in the lower bound objective function due to the observed bins are modified \naccordingly. \n\n3 Experimental Results \n\nBefore applying the algorithm to real neural spike trains we have characterized its \nproperties under controlled conditions. We constructed sample data containing two \nrandom patterns across 10 units spanning 100 msec. The patterns were stochastic \nsuch that each neuron had a probability of firing in each time bin of the pattern. \nSample patterns were drawn from the stochastic pattern templates and embedded in \nother \"noise\" spikes. The sample pattern templates are shown in the first column \nof Figure 3. 300 seconds of independent training, validation and test data were \ngenerated. All results are reported on the test data . \nAfter training the network, performance was assessed by stepping through the test \ndata and observing the pattern of activation across the hidden units obtained from \npropagating activity through the forward (recognition) connections and their corre(cid:173)\nsponding generative pattern {Ad obtained from the generative connections from the \nbinary hidden unit pattern. Typically, many of the theoretically possible 2n hidden \nunit patterns do not occur. Of the ones that do, several may code for the noise back(cid:173)\nground. A crucial issue for interpreting patterns in real neural data is to discover \nwhich of the hidden unit activity patterns correspond to actual meaningful patterns. \nWe use a measure that calculates the quality of the match of the observed pattern \nand the generative pattern it invokes. As the algorithm was not trained on the test \ndata, close matches between the generative pattern and the observed pattern imply \nreal structure that is common to the training and test dataset. With real neural \ndata, this question can also be addressed by correlating the occurrence of patterns to \nstimuli or behavioural states of the animal. \nOne match measure we have used to pick out temporally modulated structure is \nthe cost of coding the observed units using the hidden unit pattern compared to \nthe cost of using the optimal rate code for that pattern (derived by calculating the \nfiring rate for each channel in the window excluding the trigger bin). Match values \nwere calculated for each hidden unit pattern by averaging the results across all its \ncontributing observed patterns. Typical generative patterns of the added template \npatterns (in noise) are shown in the second column of Figure 3. The third column \nin the figure shows example matches from the test set, (Le. patterns that activated \nthe hidden unit pattern corresponding to the generative pattern in column 2). Note \nthat the instances of the patterns are missing some spikes present in the template, \nand are surrounded by many additional spikes. \n\n\fUsing Helmholtz Machines to Analyze Multi-channel Neuronal Recordings \n\n135 \n\nTemplate \n\nGenerative \nPattern \n\nTest set \nExample \n\nPattern 1 \n\nPattern 2 \n\nFigure 3: Pattern templates, resulting generative patterns after training (showing the \nexpected number of spikes the algorithm predicts for each bin), and example test set \noccurrences. The size and shade of the squares represents the probability of activation \nof that bin (or 0/1 for the actual occurrences), the colorbars go from 0 to 1. \n\nWe varied both the frequency of the pattern occurrences and that of the added \nbackground spikes. Performance as a function of the frequency of the background \nspikes is shown on the left in Figure 4 for a pattern frequency of .4 Hz. Performance \nas a function of the pattern frequency for a noise spike frequency of 15Hz is shown \non the right of the Figure. False alarm rates were extremely low ranging from 0-4% \nacross all the tested conditions. Also, importantly, when we ran three trials with no \nadded patterns\"no patterns were detected by the algorithm. \n\ni \ne-\n\"i \n\n~ s:: '\" ~ \n.. \n.. co \n\nE \nCD \n\nQ. \n\n'0 \n.~ \n\" \n~ \n\nQ. \n\n0.8 \n\n0.6 \n\n0 .4 \n\n0.2 \n\n0 \n\n14 \n\n~! .... , .118, .~1 \n0(.88, .95, .97) \nx(.73, .94, .94) \n\nI, \n\npattem 2 \nshifted Pattem 1 \n\npattem 1 ~ I \n\n0(0, .73, .82) \n0(0, .45, .9) \n\nx(O, 0, .51) \n\ni \ne-\n\"i N \n\n\"2 \n\n'\" ~ .. E \nCD .. co \n\nQ. \n\n'0 \n.~ \n\n\" ~ Q. \n\nx(.73, .94, .94) \n\n.(0, .83, .90) \n\n0(0, 0, .87) \n\n0.8 \n\n0.6 \n\n0.4 \n\n0.2 \n\nx(O, 0, .32) \n\n15 \n\n16 \n\n17 \n\n18 \n\n19 \n\naverage firing 'ale 01 back ground spikas (Hz) \n\n20 \n\n21 \n\nOL--L __ ~ __ ~-L __ ~ __ ~~ __ ~ \n0.1 \n0 .5 \n\n0.2 \n0.4 \nfrequency 01 pattem occurrence (Hz) \n\n0.35 \n\n0.45 \n\n0.15 \n\n0.25 \n\n0.3 \n\nFigure 4: Graphs showing the effect of adding more background spikes (left) and \ndecreasing the number of pattern occurrences in the dataset (right) on the percentage \nof patterns correctly detected. The detection of shifted pattern is due to the presence \nof a second spike in channel 4 in the pattern (hits for this case are only calculated \nfor the times when this spike was present - the others would all be missed). In fact \nin some cases the presence of the only slightly probable 3rd bin in channel 4 was \nenough to detect another shifted pattern 1. Means over 3 trials are plotted with the \nindividual trial values given in braces \n\nThe algorithm was then applied to recordings made from a chronic array of extracel(cid:173)\nlular microelectrodes placed in the primary auditory cortex of one adult marmoset \nmonkey and one adult owl monkey [deC harms and Merzenich, 1998]. On some elec-\n\n\f136 \n\nV.R. d. Sa, R. C. deCharms and M. M. Merzenich \n\nFigure 5: Data examples (all but top left) from neural recordings in an awake mar(cid:173)\nmoset monkey that invoke the same generative pattern (top left). The instances are \npatterns from the test data that activated the same hidden unit activity pattern re(cid:173)\nsulting in the generative pattern in the top left. The data windows were centered \naround all the spikes in channel 4. The brightest bins in the generative pattern rep(cid:173)\nresent an expected number of spikes of 1. 7. In the actual patterns, The darkest and \nsmallest bins represent a bin with 1 spike; each discrete grayscale/size jump repre(cid:173)\nsents an additional spike. Each subfigure is indiv!dually normalized to the bin with \nthe most spikes. \n\ntrodes spikes were isolated from individual neurons; others were derived from small \nclusters of nearby neurons. Figure 5 shows an example generative pattern (accounting \nfor 2.8% of the test data) that had a high match value together with example occur(cid:173)\nrences in the test data. The data were responses recorded to vocalizations played to \nthe marmoset monkey, channel 4 was used as the trigger channel and 7 hidden units \nwere added. \n\n4 Discussion \n\nWe have introduced a procedure for searching for structure in multineuron spike \ntrains, and particularly for searching for statistically reproducible stochastic temporal \nevents among ensembles of neurons. We believe this method has great promise for \nexploring the important question of ensemble coding in many neuronal systems, a \ncrucial part of the problem of understanding neural information coding. The strengths \nof this method include the ability to deal with stochastic patterns, the search for any \ntype of reproducible structure including the extraction of patterns of unsuspected \nnature, and its efficient, greedy, search mechanism that allows it to be applied to \nlarge numbers of neurons. \n\nAcknowledgements \n\nWe would like to acknowledge Geoff Hinton for useful suggestions in the early stages \nof this work, David MacKay for helpful comments on an earlier version of the \nmanuscript, and the Sloan Foundation for financial support. \n\n\fUsing Helmholtz Machines to Analyze Multi-channel Neuronal Recordings \n\n137 \n\nReferences \n\n[Abeles, 1982] Abeles, M. (1982). Local Cortical Circuits An Electrophysiological \n\nStudy, volume 6 of Studies of Brain Function. Springer-Verlag. \n\n[Abeles et al., 1995] Abeles, M., Bergman, H., Gat, I., Meilijson, I., Seidemann, E., \nTishby, N., and Vaadia, E. (1995). Cortical activity flips among quasi-stationary \nstates. Proceedings of the National Academy of Science, 92:8616- 8620. \n\n[Abeles et al., 1993] Abeles, M., Bergman, H., Margalit, E., and Vaadia, E. (1993). \nSpatiotemporal firing patterns in the frontal cortex of behaving monkeys. Journal \nof Neurophysiology, 70(4):1629-1638. \n\n[Abeles and Gerstein, 1988] Abeles, M. and Gerstein, G. L. (1988). Detecting spa(cid:173)\n\ntiotemporal firing patterns among simultaneously recorded single neurons. Journal \nof Neurophysiology, 60(3). \n\n[Dayan et al., 1995] Dayan, P., Hinton, G. E., Neal, R. M., and Zemel, R. S. (1995). \n\nThe helmholtz machine. Neural Computation, 7:889-904. \n\n[de Sa and Hinton, 1998] de Sa, V. R. and Hinton, G. E. (1998). Cascaded redun(cid:173)\n\ndancy reduction. to appear in Network{February). \n\n[deC harms and Merzenich, 1996] deCharms, R. C. and Merzenich, M. M. (1996). Pri(cid:173)\nmary cortical representation of sounds by the coordination of action-potential tim(cid:173)\ning. Nature, 381:610-613. \n\n[deCharms and Merzenich, 1998] deCharms, R. C. and Merzenich, M. M. (1998). \nCharacterizing neurons in the primary auditory cortex of the awake primate using \nreverse correlation. this volume. \n\n[Frey et al., 1996] Frey, B. J., Hinton, G. E., and Dayan, P. (1996). Does the wake(cid:173)\nsleep algorithm produce good density estimators? In Touretzky, D., Mozer, M., \nand Hasselmo, M., editors, Advances in Neural Information Processing Systems 8, \npages 661-667. MIT Press. \n\n[Gat and Tishby, 1993] Gat, I. and Tishby, N. (1993). Statistical modeling of cell(cid:173)\nassemblies activities in associative cortex of behaving monkeys. In Hanson, S., \nCowan, J., and Giles, C., editors, Advances in Neural Information Processing Sys(cid:173)\ntems 5, pages 945-952. Morgan Kaufmann. \n\n[Grayet al., 1989] Gray, C. M., Konig, P., Engel, A. K., and Singer, W. (1989). \nOscillatory responses in cat visual cortex exhibit inter-columnar synchronization \nwhich reflects global stimulus properties. Nature, 338:334-337. \n\n[Hinton et al., 1995] Hinton, G. E., Dayan, P., Frey, B. J., and Neal, R. M. (1995). \nThe wake-sleep algorithm for unsupervised neural networks. Science, 268:1158-\n116l. \n\n[Martignon et al., 1997] Martignon, L., Laskey, K., Deco, G., and Vaadia, E. (1997). \nLearning exact patterns of quasi-synchronization among spiking neurons from data \non multi-unit recordings. In Mozer, M., Jordan, M., and Petsche, T., editors, \nAdvances in Neural Information Processing Systems 9, pages 76-82. MIT Press. \n\n[Meister et al., 1995] Meister, M., Lagnado, L., and Baylor, D. (1995). Concerted \n\nsignaling by retinal ganglion cells. Science, 270:95-106. \n\n[Schnitzer and Meister,] Schnitzer, M. J. and Meister, M. \n\nInformation theoretic \n\nidentification of neural firing patterns from multi-electrode recordings. in prepara(cid:173)\ntion. \n\n[Vaadia et al., 1995] Vaadia, E., Haalman, I., Abeles, M., Bergman, H., Prut, Y., \nSlovin, H., and Aertsen, A. (1995). Dynamics of neuronal interactions in monkey \ncortex in relation to behavioural events. Nature, 373:515-518. \n\n\f", "award": [], "sourceid": 1442, "authors": [{"given_name": "Virginia", "family_name": "de", "institution": null}, {"given_name": "R.", "family_name": "DeCharms", "institution": null}, {"given_name": "Michael", "family_name": "Merzenich", "institution": null}]}