{"title": "Dynamical segmentation of single trials from population neural data", "book": "Advances in Neural Information Processing Systems", "page_first": 756, "page_last": 764, "abstract": "Simultaneous recordings of many neurons embedded within a  recurrently-connected cortical network may provide concurrent views into the dynamical processes of that network, and thus its computational function.  In principle, these dynamics might be identified by purely unsupervised, statistical means.  Here, we show that a Hidden Switching Linear Dynamical Systems (HSLDS) model---in which multiple linear dynamical laws approximate a nonlinear and potentially non-stationary dynamical process---is able to distinguish different dynamical regimes within single-trial motor cortical activity associated with the preparation and initiation of hand movements.  The regimes are identified without reference to behavioural or experimental epochs, but nonetheless transitions between them correlate strongly with external events whose timing may vary from trial to trial.  The HSLDS model also performs better than recent comparable models in predicting the firing rate of an isolated neuron based on the firing rates of others, suggesting that it  captures more of the \"shared variance\" of the data.  Thus, the method is able to trace the dynamical processes underlying the coordinated evolution of network activity in a way that appears to reflect its computational role.", "full_text": "Dynamical segmentation of single trials\n\nfrom population neural data\n\nBiljana Petreska\n\nGatsby Computational Neuroscience Unit\n\nUniversity College London\n\nbiljana@gatsby.ucl.ac.uk\n\nByron M. Yu\nECE and BME\n\nCarnegie Mellon University\n\nbyronyu@cmu.edu\n\nJohn P. Cunningham\nDept of Engineering\n\nUniversity of Cambridge\njpc74@cam.ac.uk\n\nGopal Santhanam, Stephen I. Ryu\u2020, Krishna V. Shenoy\u2021\n\u2021Bioengineering, Neurobiology and Neurosciences Program\n\nElectrical Engineering\n\nStanford University\n\n\u2020Dept of Neurosurgery, Palo Alto Medical Foundation\n{gopals,seoulman,shenoy}@stanford.edu\n\nManeesh Sahani\n\nGatsby Computational Neuroscience Unit\n\nUniversity College London\n\nmaneesh@gatsby.ucl.ac.uk\n\nAbstract\n\nSimultaneous recordings of many neurons embedded within a recurrently-\nconnected cortical network may provide concurrent views into the dynamical pro-\ncesses of that network, and thus its computational function. In principle, these\ndynamics might be identi\ufb01ed by purely unsupervised, statistical means. Here,\nwe show that a Hidden Switching Linear Dynamical Systems (HSLDS) model\u2014\nin which multiple linear dynamical laws approximate a nonlinear and poten-\ntially non-stationary dynamical process\u2014is able to distinguish different dynami-\ncal regimes within single-trial motor cortical activity associated with the prepara-\ntion and initiation of hand movements. The regimes are identi\ufb01ed without refer-\nence to behavioural or experimental epochs, but nonetheless transitions between\nthem correlate strongly with external events whose timing may vary from trial to\ntrial. The HSLDS model also performs better than recent comparable models in\npredicting the \ufb01ring rate of an isolated neuron based on the \ufb01ring rates of others,\nsuggesting that it captures more of the \u201cshared variance\u201d of the data. Thus, the\nmethod is able to trace the dynamical processes underlying the coordinated evo-\nlution of network activity in a way that appears to re\ufb02ect its computational role.\n\n1\n\nIntroduction\n\nWe are now able to record from hundreds\u2014and very likely soon from thousands\u2014of neurons in\nvivo. By studying the activity of these neurons in concert we may hope to gain insight not only into\nthe computations performed by speci\ufb01c neurons, but also into the computations performed by the\npopulation as a whole. The dynamics of such collective computations can be seen in the coordinated\nactivity of all of the neurons within the local network; although each individual such neuron may\nre\ufb02ect this coordinated component only noisily. Thus, we hope to identify the computationally-\nrelevant network dynamics by purely statistical, unsupervised means\u2014capturing the shared evolu-\n\n1\n\n\ftion through latent-variable state-space models [1, 2, 3, 4, 5, 6, 7, 8]. The situation is similar to\nthat of a camera operating at the extreme of its light sensitivity. A single pixel conveys very little\ninformation about an object in the scene, both due to thermal and shot noise and due to the ambigu-\nity of the single-channel signal. However, by looking at all of the noisy pixels simultaneously and\nexploiting knowledge about the structure of natural scenes, the task of extracting the object becomes\nfeasible. In a similar way, noisy data from many neurons participating in a local network computa-\ntion needs to be combined with the learned structure of that computation\u2014embodied by a suitable\nstatistical model\u2014to reveal the progression of the computation.\nNeural spiking activity is usually analysed by averaging across multiple experimental trials, to ob-\ntain a smooth estimate of the underlying \ufb01ring rates [2, 3, 4, 5]. However, even under carefully\ncontrolled experimental conditions, the animal\u2019s behavior may vary from trial-to-trial. Reaction\ntime in motor or decision-making tasks for example, re\ufb02ects internal processes that can last for\nmeasurably different periods of time. In these cases traditional methods are challenging to apply,\nas there is no obvious way of aligning the data from different trials. It is thus essential to develop\nmethods for the analysis of neural data that can account for the timecourse of a neural computation\nduring a single trial. Single-trial methods are also attractive for analysing speci\ufb01c trials in which\nthe subject exhibits erroneous behavior. In the case of a surprisingly long movement preparation\ntime or a wrong decision, it becomes possible to identify the sources of error at the neural level.\nFurthermore, single-trial methods allow the use of more complex experimental paradigms where the\nexternal stimuli can arise at variable times (e.g. variable time delays).\nHere, we study a method for the unsupervised identi\ufb01cation of the evolution of the network com-\nputational state on single trials. Our approach is based on a Hidden Switching Linear Dynamical\nSystem (HSLDS) model, in which the coordinated network in\ufb02uence on the population is captured\nby a low-dimensional latent variable which evolves at each time step according to one of a set of\navailable linear dynamical laws. Similar models have a long history in tracking, speech and, indeed,\nneural decoding applications [9, 10, 11] where they are variously known as Switching Linear Dy-\nnamical System models, Jump Markov models or processes, switching Kalman Filters or Switching\nLinear Gaussian State Space models [12]. We add the pre\ufb01x \u201cHidden\u201d to stress that in our applica-\ntion neither the switching process nor the latent dynamical variable are ever directly observed, and so\nlearning of the parameters of the model is entirely unsupervised\u2014and again, learning in such mod-\nels has a long history [13]. The details of the HSLDS model, inference and learning are reviewed\nin Section 2. In our models, the transitions between linear dynamical laws may serve two purposes.\nFirst, they may provide a piecewise-linear approximation to a more accurate non-linear dynamical\nmodel [14]. Second, they may re\ufb02ect genuine changes in the dynamics of the local network, perhaps\ndue to changes in the goals of the underlying computation under the control of signals external to\nthe local area. This second role leads to a computational segmentation of individual trials, as we\nwill see below.\nWe compare the performance of the HSLDS model to Gaussian Processes Factor Analysis (GPFA),\na method introduced by [8] which analyses multi-neuron data on a single-trial basis with similar mo-\ntivation to our own. Instead of explicitly modeling the network computation as a dynamical process,\nGPFA assumes that the computation evolves smoothly in time. In this sense, GPFA is less restrictive\nand would perform better if the HSLDS provided a bad model of the real network dynamics. How-\never GPFA assumes that the latent dimensions evolve independently, making GPFA more restrictive\nthan HSLDS in which the latent dimensions can be coupled. Coupling the latent dynamics intro-\nduces complex interactions between the latent dimensions, which allows a richer set of behaviors.\nTo validate our HSLDS model against GPFA and a single LDS we will use the cross-prediction\nmeasure introduced with GPFA [8] in which the \ufb01ring rate of each neuron is predicted using only\nthe \ufb01ring rates of the rest of the neurons; thus the metric measures how well each model captures the\nshared components of the data. GPFA and cross-prediction are reviewed brie\ufb02y in Section 3, which\nalso introduces the dataset used; and the cross-prediction performance of the models is compared in\nSection 4. Having validated the HSLDS approach, we go on to study the dynamical segmentation\nidenti\ufb01ed by the model in the rest of Section 4, leading to the conclusions of Section 5.\n\n2\n\n\f2 Hidden Switching Linear Dynamical Systems\n\nOur goal is to extract the structure of computational dynamics in a cortical network from the recorded\n\ufb01ring rates of a subset of neurons in that network. We use a Hidden Switching Linear Dynamical\nSystems (HSLDS) model to capture the component of those \ufb01ring rates which is shared by multiple\ncells, thus exploiting the intuition that network computations should be re\ufb02ected in coordinated\nactivity across a local population. This will yield a latent low-dimensional subspace of dynamical\nstates embedded within the space of noisy measured \ufb01ring rates, along with a model of the dynamics\nwithin that latent space. The dynamics of the HSLDS model combines a number of linear dynamical\nsystems (LDS), each of which capture linear Markovian dynamics using a \ufb01rst-order linear auto-\nregressive (AR) rule [9, 15]. By combining multiple such rules, the HSLDS model can provide a\npiecewise linear approximation to nonlinear dynamics, and also capture changes in the dynamics of\nthe local network driven by external in\ufb02uences that presumably re\ufb02ect task demands. In the model\nimplemented here, transitions between LDS rules themselves form a Markov chain.\nLet x:,t \u2208 IRp\u00d71 be the low-dimensional computational state that we wish to estimate. This latent\ncomputational state re\ufb02ects the network-level computation performed at timepoint t that gives rise\nto the observed spiking activity y:,t \u2208 IRq\u00d71. Note that the dimensionality of the computational state\np is lower than the dimensionality of the recorded neural data q which corresponds to the number of\nrecorded neurons. The evolution of the computational state x:,t is given by\n\nx:,t|x:,t\u22121, st \u223c N (Astx:,t\u22121, Kst)\n\n(1)\nwhere N (\u00b5, \u03a3) denotes a Gaussian distribution with mean \u00b5 and covariance \u03a3. The linear dynamical\nmatrices Ast \u2208 IRp\u00d7p and innovations covariance matrices Kst \u2208 IRp\u00d7p are parameters of the\nmodel and need to be learned. These matrices are indexed by a switch variable st \u2208 {1, ..., S} such\nthat different Ast and Kst need to be learned for each of the S possible linear dynamical systems.\nIf the dependencies on st are removed, Eq. 1 de\ufb01nes a single LDS.\nThe switch variable st speci\ufb01es which linear dynamical law guides the evolution of the latent state\nx:,t at timepoint t and as such provides a piecewise approximation to the nonlinear dynamics with\nwhich x:,t may evolve. The variable st itself is drawn from a Markov transition matrix M learned\nfrom the data:\n\nst \u223c Discrete(M:,st\u22121 )\n\nAs mentioned above, the observed neural activity y:,t \u2208 IRq\u00d71 is generated by the latent dynamics\nand denotes the spike counts (Gaussianised as described below) of q simultaneously recorded neu-\nrons at timepoints t \u2208 {1, ..., T}. The observations y:,t are related to the latent computational states\nx:,t through a linear-Gaussian relationship:\n\ny:,t|x:,t \u223c N (Cx:,t + d, R).\n\nwhere the observation matrix C \u2208 IRq\u00d7p, offset d \u2208 IRq\u00d71, and covariance matrix R \u2208 IRq\u00d7q are\nmodel parameters that need to be learned. We force R to be diagonal and to keep track only of the\nindependent noise variances. This means that the \ufb01ring rates of different neurons are independent\nconditioned on the latent dynamics, compelling the shared variance to live only in the latent space.\nNote that different neurons can have different independent noise variances. We use a Gaussian\nrelationship instead of a point-process likelihood model for computational tractability. Finally, the\nobservation dynamics do not depend on which linear dynamical system is used (i.e., are independent\nof st). A graphical model of the particular HSLDS instance we have used is shown in Figure 2.\nInference and learning in the model are performed by approximate Expectation Maximisation (EM).\nInference (or the E-step) requires \ufb01nding appropriate expected suf\ufb01cient statistics under the distri-\nbutions of the computational latent state and switch variable at each point in time given the observed\nneural data p(x1:T , s1:T|y1:T ). Inference in the HSLDS is computationally intractable because of the\nfollowing exponential complexity. At the initial timepoint, s0 can take one of S discrete values. At\nthe next timepoint, each of the S possible latent states can again evolve according to S different lin-\near dynamical laws, such that at timepoint t we need to keep track of St possible solutions. To avoid\n\n3\n\n\fFigure 1: Graphical model of the HSLDS. The \ufb01rst layer corresponds to the discrete switch variable\nthat dictates which of the S available linear dynamical systems (LDSs) will guide the latent dynamics\nshown in the second layer. The latent dynamics evolves as a linear dynamical system at timepoint t\nand presumably captures relevant aspects of the computation performed at the level of the recorded\nneural network. The relationship between the latent dynamics and neural data (third layer) is again\nlinear-Gaussian, such that each computational state is associated to a speci\ufb01c denoised \ufb01ring pattern.\nThe dimensionality of the latent dynamics x is lower than that of the observations y (equivalent to\nthe number of recorded neurons), meaning that x extracts relevant features re\ufb02ected in the shared\nvariance of y. Note that there are no connections between xt\u22121 and st, nor st and y.\n\nthis exponential scaling, we use an approximate inference algorithm based on Assumed Density Fil-\ntering [16, 17, 18] and Assumed Density Smoothing [19]. The algorithm comprises a single forward\npass that estimates the \ufb01ltered posterior distribution p(xt, st|y1:t) and a single backward pass that\nestimates the smoothed posterior distribution p(xt, st|y1:T ). The key idea is to approximate these\nposterior distributions by a simple tractable form such as a single Gaussian. The approximated dis-\ntribution is then propagated through time conditioned on the new observation. The smoothing step\nrequires an additional simplifying assumption where p(xt+1|st, st+1, y1:T ) \u2248 p(xt+1|st+1, y1:T ) as\nproposed in [19]. It is also possible to use a mixture of a \ufb01xed number of Gaussians as the approx-\nimating distribution, at the cost of greater computational time. We found that this approach yielded\nsimilar results in pilot runs, and thus retained the single-Gaussian approximation.\nLearning the model parameters (or the M-step) can be performed using the standard procedure of\nmaximizing the expected joint log-likelihood:\n\nN(cid:88)\n\n(cid:104)log p(xn\n\n1:T , yn\n\n1:T )(cid:105)pold(xn|yn)\n\nn=1\n\nwith respect to the parameters Ast, Kst, M, C, d and R, where the superscript n indexes data from\neach of N different trials. In practice, the estimated individual variance of particularly low-\ufb01ring\nneurons was very low and likely to be incorrectly estimated. Therefore we assumed a Wishart prior\non the observation covariance matrix R, which resulted in an update rule that adds a \ufb01xed parameter\n\u03c8 \u2208 IR to all of the values at the diagonal. In the analyses below \u03c8 was \ufb01xed to the value that\ngave the best cross-prediction results (see Section 3.2). Finally, the most likely state of the switch\np(s1:T|y1:T ) was estimated using the standard Viterbi algorithm [20],\nvariable s\u2217\nwhich ensures that the most likely switch variable path is in fact possible in terms of the transitions\nallowed by M.\n\n1:T = arg maxs1:T\n\n3 Model Comparison and Experimental Data\n\n3.1 Gaussian Process Factor Analysis\n\nBelow, we compare the performance of the HSLDS model to Gaussian Process Factor Analysis\n(GPFA), another method for estimating the functional computation of a set of neurons. GPFA is an\nextension of Factor Analysis that leverages time-label information, introduced in [8]. In this model,\nthe latent dynamics evolve as a Gaussian Process (GP), with a smooth correlation structure between\nthe latent states at different points in time. This combination of FA and the GP prior work together\nto identify smooth low-dimensional latent trajectories.\n\n4\n\n\fFormally, each dimension of the low-dimensional latent states x:,t is indexed by i \u2208 {1, ..., p} and\nde\ufb01nes a separate GP:\n\nxi,: \u223c N (0, Ki)\n\nwhere xi,: \u2208 IR1\u00d7T is the trajectory in time of the ith latent dimension and Ki \u2208 IRT\u00d7T is the\nith GP smoothing covariance matrix. Ki is set to the commonly-used squared exponential (SE)\ncovariance function as de\ufb01ned in [8].\nWhereas HSLDS explicitly models the dynamics of the network computation, GPFA only assumes\nthat the evolution of the computational state is smooth. Thus GPFA is a less restrictive model than\nHSLDS, but being model-free makes it also less informative of the dynamical rules that underlie the\ncomputation. A major advantage of GPFA over HSLDS is that the solution is approximation-free\nand faster to run.\n\n3.2 Cross-prediction performance measure\n\nTo compare model goodness-of-\ufb01t we adopt the cross-prediction metric of [8]. All of these models\nattempt to capture the shared variance in the data, and so performance may be measured by how well\nthe activity of one neuron can be predicted using the activity of the rest of the neurons. It is important\nto measure the cross-prediction error on trials that have not been used for learning the parameters\nof the model. We arrange the observed neural data in a matrix Y = [y:,1, ..., y:,T ] \u2208 IRq\u00d7T where\neach row yj,: represents the activity of neuron j in time. The model cross-prediction for this neuron\nj is \u02c6yj,: = E[yj,:|Y\u2212j,:] where Y\u2212j,: \u2208 IR(q\u22121)\u00d7T represents all but the jth row of Y . We \ufb01rst\nestimate the trajectories in the latent space using all but the jth neuron P (x1:p,:|Y\u2212j,:) in a set of\ntesting trials. We then project this estimate back to the high-dimensional space to obtain the model\ncross-prediction \u02c6yj,: using \u02c6yj,t = Cj,: \u00b7 E[x(:, t)|Y\u2212j,:] + dj. The error is computed as the sum-\nof-squared errors between the model cross-prediction and the observed Gaussianised spike counts\nacross all neurons and timepoints; and we plot the difference between this error (per time bin) and\nthe average temporal variance of the corresponding neuron in the corresponding trial (denoted as\nVar-MSE).\nNote that the performance of difference models can be evaluated as a function of the dimension-\nality of the latent state. The HSLDS model has two futher free parameters which in\ufb02uence cross-\nprediction peformance: the number of available LDSs S and the concentration of the Wishart prior\n\u03c8.\n\n3.3 Data\n\nWe applied the model to data recorded in the premotor and motor cortices of a rhesus macaque while\nit performed a delayed center-out reach task. A trial began with the animal touching and looking at\nan illuminated point at the center of a vertically oriented screen. A target was then illuminated at a\ndistance of 10cm and in one of seven directions (0, 45, 90, 135, 180, 225, 315) away from this central\nstarting point. The target remained visible while the animal prepared but withheld a movement to\ntouch it. After a random delay of between 200 and 700ms, the illumination of the starting point was\nextinguished, which was the animal\u2019s cue (the \u201cgo cue\u201d) to reach to the target to obtain a reward.\nNeural activity was recorded from 105 single and multi-units, using a 96-electrode array (Blackrock,\nSalt Lake City, UT). All active units were included in the analysis without selection based on tuning.\nThe spike-counts were binned at a relatively \ufb01ne time-scale of 10ms (non-overlapping bins). As in\n[8], the observations were taken to be the square-roots of these spike counts, a transformation that\nhelps to Gaussianise and stabilise the variance of count data [21].\n\n4 Results\n\nWe \ufb01rst compare the cross-prediction-derived goodness-of-\ufb01t of the HSLDS model to that of the\nsingle LDS and GPFA models in section 4.1. We \ufb01nd that HSLDS provides a better model of the\nshared component of the recorded data than do the two other methods. We then study the dynamical\nsegmentation found by the HSLDS model, \ufb01rst by looking at a typical example (section 4.2) and\nthen by correlating dynamical switches to behavioural events (section 4.3). We show that the latent\n\n5\n\n\fFigure 2:\nPerformance of the HSLDS\n(green solid line), LDS (blue dashed) and\nGPFA (red dash-dotted) models. Analy-\nses are based on one movement type with\nthe target in the 45\u25e6 direction. Cross-\nprediction error was computed using 4-\nfold cross-validation. HSLDS with dif-\nferent values of S also outperformed the\nLDS case (which is equivalent to S = 1).\nPerformance was more sensitive to the\nstrength \u03c8 of the Wishart prior, and the best\nperforming model is shown.\n\ntrajectories and dynamical transitions estimated by the model predict reaction time, a behavioral\ncovariate that varies from trial-to-trial. Finally we argue that these behavioral correlates are dif\ufb01cult\nto obtain using a standard neural analysis method.\n\n4.1 Cross-prediction\n\nTo validate the HSLDS model we compared it to the GPFA model described in section 3.1 and a\nsingle LDS model. Since all of these models attempt to capture the shared variance of the data\nacross neurons and multiple trials, we used cross-prediction to measure their performance. Cross-\nprediction looks at how well the spiking activity of one neuron is predicted just by looking at the\nspiking activity of all of the other neurons (described in detail in Section 3.2). We found that both the\nsingle LDS and HSLDS models that allow for coupled latent dynamics do better than GPFA, shown\nin Figure 2, which could be attributed to the fact that GPFA constrains the different dimensions\nof the latent computational state to evolve independently. The HSLDS model also outperforms a\nsingle LDS yielding the lowest prediction error for all of the latent dimensions we have looked at,\narguing that a nonlinear model of the latent dynamics is better than a linear model. Note that the\nminimum prediction error asymptotes after 10 latent dimensions. It is tempting to suggest that for\nthis particular task the effective dimensionality of the spiking activity is much lower than that of\nthe 105 recorded neurons, thereby justifying the use of a low-dimensional manifold to describe the\nunderlying computation. This could be interpreted as evidence that neurons may carry redundant\ninformation and that the (nonlinear) computational function of the network is better re\ufb02ected at the\nlevel of the population of neurons, rather than in single neurons.\n\n4.2 Data segmentation\n\nBy de\ufb01nition, the HSLDS model partitions the latent dynamics underlying the observed data into\ntime-labeled segments that may evolve linearly. The segments found by HSLDS correspond to\nperiods of time in which the latent dynamics seem to evolve according to different linear dynamical\nlaws, suggesting that the observed \ufb01ring pattern of the network has changed as a whole. Thus, by\nconstruction, the HSLDS model can subdivide the network activity into different \ufb01ring regimes for\neach trial speci\ufb01cally.\nFor the purpose of visualization, we have applied an additional orthonormalization post-processing\nstep (as in [8]) that helps us order the latent dimensions according to the amount of covariance ex-\nplained. The orthonormalization consists of \ufb01nding the singular-value decomposition of C, allowing\nCx:,t), where UC \u2208 IRq\u00d7p is a matrix with orthonormal\nus to write the product Cx:,t as UC(DCV (cid:48)\ncolumns. We will refer to \u02dcx:,t = DCV (cid:48)\nCx:,t as the orthonormalised latent state at time t. The \ufb01rst\ndimension of the orthonormalised latent state in time \u02dcx1,: corresponds then to the latent trajectory\nwhich explains the most covariance. Since the columns of UC are orthonormal, the relationship\nbetween the orthonormalised latent trajectories and observed data can be interpreted in an intuitive\nway, similarly to Principal Components Analysis (PCA). The results presented here were obtained\nby setting the number of switching LDSs S, latent space dimensionality p and Wishart prior \u03c8 to\nvalues that yielded a reasonably low cross-prediction error.\nFigure 3 shows a typical example of the HSLDS model applied to data in one movement direction,\nwhere the different trials are fanned out vertically for illustration purposes. The \ufb01rst orthonormalized\n\n6\n\n45678910111213141566.26.46.66.877.2x 10\u22123Latent state dimensionality pVar-MSEHSLDS, S=7LDSGPFA\fFigure 3: HSLDS applied to neural data from the 45\u25e6 direction movement (S = 7, p = 7, \u03c8 = 0.05).\nThe \ufb01rst dimension of the orthonormalised latent trajectory is shown. The colors denote the different\nlinear dynamical systems used by the model. Each line is a different trial, aligned to the target onset\n(left) and go cue (right), and sorted by reaction time. Switches reliably follow the target onset and\nprecede the movement onset, with a time lag that is correlated with reaction time.\n\nlatent dimension indicates a transient in the recorded population activity shortly after target onset\n(which is marked by the red dots) and a sustained change of activity after the go cue (marked by\nthe green dots). The colours of the lines indicate the most likely setting of the switching variable\nat each time. It is evident that the learned solution segments each trial into a broadly reproducible\nsequence of dynamical epochs. Some transitions appear to reliably follow or precede external events\n(even though these events were not used to train the segmentation) and may re\ufb02ect actual changes\nin dynamics due to external in\ufb02uences. Others seem to follow each other in quick succession, and\nmay instead re\ufb02ect linear approximations to non-linear dynamical processes\u2014evident particularly\nduring transiently rapid changes in the latent state. Unfortunately, the current model does not allow\nus to distinguish quantitatively between these two types of transition.\nNote that the delays (time from target onset to go cue) used in the experiment varied from 200 to\n700ms, such that the model systematically detected a change in the neural \ufb01ring rates shortly after\nthe go cue appeared on each individual trial. The model succeeds at detecting these changes in a\npurely unsupervised fashion as it was not given any time information about the external experimental\ninputs.\n\n4.3 Behavioral correlates during single trials\n\nIt is not surprising that the \ufb01ring rates of the recorded neurons change during different behavioral\nperiods. For example, neural activity is often observed to be higher during movement execution\nthan during movement preparation. However, the HSLDS method reliably detects the behaviourally-\ncorrelated changes in the pattern of neural activity across many neurons on single trials.\nIn order to ensure that HSLDS captures trial-speci\ufb01c information we have looked at whether the\ntime post-go-cue at which the model estimates a \ufb01rst switch in the neural dynamics could predict\nthe subsequent onset of movement and thus the trial reaction time (RT). We found that the \ufb01ltered\nmodel (which does not incorporate spiking data from future times into its estimate of the switching\nvariable) could explain 52% of the reaction time variance on average, across the 7 reach directions\n(Figure 4).\nCould a more conventional approach do better? We attempted to use a combination of the \u201cpop-\nulation vector\u201d (PV) method and the \u201crise-to-threshold\u201d hypothesis. The PV sums the preferred\ndirections of a population of neurons, weighted by the respective spike counts in order to decode the\nrepresented direction of movement [22]. The rise-to-threshold hypothesis asserts that neural \ufb01ring\nrates rise during a preparatory period and movement is initiated when the population rate crosses a\nthreshold [23]. The neural data used for this analysis were smoothed with a Gaussian window and\nsampled at 1 ms. We \ufb01rst estimated the preferred direction \u02c6pq of the neuron indexed by q as the\n\n7\n\n\fFigure 4: Correlation (R2 = 0.52) between\nthe reaction time and \ufb01rst \ufb01ltered HSLDS\nswitch following the go cue, on a trial-by-\ntrial basis and averaged across directions.\nSymbols correspond to movements in differ-\nent directions. Note that in two catch trials\nthe model did not switch following the go\ncue, so we considered the last switch before\nthe cue.\n\nunit vector in the direction of (cid:126)pq =(cid:80)7\n\nd=1 rd\n\nt = ||(cid:80)Q\n\ni (cid:126)vd where d indexes the instructed movement direction\n(cid:126)vd and rd\nq is the mean \ufb01ring rate of neuron q during all movements in direction d. The preferred\ndirection of a given neuron often differed between plan and movement activity, so we used data from\nq as this gave us better results when trying to\nmovement onset until the movement end to estimate rd\nestimate a threshold in the rising movement-related activity. We then estimated the instanteneous\nq=1 yq,t(cid:126)pq||, where yq,t is the smoothed spike\namplitude of the network PV at time t as sd\ncount of neuron q at time t, Q is the number of neurons and || (cid:126)w|| denotes the norm of the vector (cid:126)w.\nFinally, we searched for a threshold length (one per direction), such that the time at which the PV\nexceeded this length on each trial was best correlated with RT.\nNote that this approach uses considerable supervision that was denied to the HSLDS model. First,\nthe movement epoch of each trial was identi\ufb01ed to de\ufb01ne the PV. Second, the thresholds were\nselected so as to maximize the RT correlation\u2014a direct form of supervision. Finally, this selection\nwas based on the same data as were used to evaluate the correlation score, thus leading to potential\nover\ufb01tting in the explained variance. The HSLDS model was also trained on the same trials, which\ncould lead to some over\ufb01tting in terms of likelihood, but should not introduce over\ufb01tting in the\ncorrelation between switch times and RT, which is not directly optimised.\nDespite these considerable advantages, the PV approach did not predict RT as well as did the\nHSLDS, yielding an average variance explained across conditions of 48%.\n\n5 Conclusion\n\nIt appears that the Hidden Switching Linear Dynamical System (HSLDS) model is able to appropri-\nately extract relevant aspects of the computation re\ufb02ected in a network of \ufb01ring neurons. HSLDS\nexplicitly models the nonlinear dynamics of the computation as a piecewise linear process that cap-\ntures the shared variance in the neural data across neurons and multiple trials.\nOne limitation of HSLDS is the approximate EM algorithm used for inference and learning of the\nmodel parameters. We have traded off computational tractability with accuracy, such that the model\nmay settle into a solution that is simpler than the optimum. A second limitation of HSLDS is the\nslow training time of EM, enforcing an of\ufb02ine learning of the model parameters.\nDespite these simplications, HSLDS can be used to dynamically segment the neural activity at the\nlevel of the whole population of neurons into periods of different \ufb01ring regimes. We showed that in a\ndelayed-reach task the \ufb01ring regimes found correlate well with the experimental behavioral periods.\nThe computational trajectories found by HSLDS are trial-speci\ufb01c and with a dimensionality that\nis more suitable for visualization than the high-dimensional spiking activity. Overall, HSLDS are\nattractive models for uncovering behavioral correlates in neural data on a single-trial basis.\nAcknowledgments. This work was supported by DARPA REPAIR (N66001-10-C-2010),\nthe Swiss\nNational Science Foundation Fellowship PBELP3-130908, the Gatsby Charitable Foundation, UK EPSRC\nEP/H019472/1 and NIH-NINDS-CRCNS-R01, NDSEG and NSF Graduate Fellowships, Christopher and Dana\nReeve Foundation. We are very grateful to Jacob Macke, Lars Buesing and Alexander Lerchner for discussion.\n\n8\n\n\fReferences\n[1] A. C. Smith and E. N. Brown. Estimating a state-space model from point process observations. Neural\n\nComputation, 15(5):965\u2013991, 2003.\n\n[2] M. Stopfer, V. Jayaraman, and G. Laurent. Intensity versus identity coding in an olfactory system. Neuron,\n\n39:991\u20131004, 2003.\n\n[3] S. L. Brown, J. Joseph, and M. Stopfer. Encoding a temporally structured stimulus with a temporally\n\nstructured neural representation. Nature Neuroscience, 8(11):1568\u20131576, 2005.\n\n[4] R. Levi, R. Varona, Y. I. Arshavsky, M. I. Rabinovich, and A. I. Selverston. The role of sensory network\n\ndynamics in generating a motor program. Journal of Neuroscience, 25(42):9807\u20139815, 2005.\n\n[5] O. Mazor and G. Laurent. Transient dynamics versus \ufb01xed points in odor representations by locust\n\nantennal lobe projection neurons. Neuron, 48:661\u2013673, 2005.\n\n[6] B. M. Broome, V. Jayaraman, and G. Laurent. Encoding and decoding of overlapping odor sequences.\n\nNeuron, 51:467\u2013482, 2006.\n\n[7] M. M. Churchland, B. M. Yu, M. Sahani, and K. V. Shenoy. Techniques for extracting single-trial activity\n\npatterns from large-scale neural recordings. Current Opinion in Neurobiology, 17(5):609\u2013618, 2007.\n\n[8] B. M. Yu, J. P. Cunningham, G. Santhanam, S. I. Ryu, K. V. Shenoy, and M. Sahani. Gaussian-process\nfactor analysis for low-dimensional single-trial analysis of neural population activity. J Neurophysiol,\n102:614\u2013635, 2009.\n\n[9] Y. Bar-Shalom and Xiao-Rong Li. Estimation and Tracking: Principles, Techniques and Software. Artech\n\nHouse, Norwood, MA, 1998.\n\n[10] B. Mesot and D. Barber. Switching linear dynamical systems for noise robust speech recognition. IEEE\n\nTransactions of Audio, Speech and Language Processing, 15(6):1850\u20131858, 2007.\n\n[11] W. Wu, M.J. Black, D. Mumford, Y. Gao, E. Bienenstock, and J. P. Donoghue. Modeling and decoding\nmotor cortical activity using a switching kalman \ufb01lter. IEEE Transactions on Biomedical Engineering,\n51(6):933\u2013942, 2004.\n\n[12] D. Barber. Bayesian Reasoning and Machine Learning. Cambridge University Press. In Press, 2011.\n[13] K. P. Murphy. Switching kalman \ufb01lters. Technical Report 98-10, Compaq Cambridge Research Lab,\n\n1998.\n\n[14] B. M. Yu, A. Afshar, G. Santhanam, S. I. Ryu, K. V. Shenoy, and M. Sahani. Extracting dynamical\nstructure embedded in neural activity. In Y. Weiss, B. Sch\u00a8olkopf, and J. Platt, editors, Advances in Neural\nInformation Processing Systems 18, pages 1545\u20131552. Cambridge, MA: MIT Press, 2006.\n\n[15] M. West and J. Harrison. Bayesian Forecasting and Dynamic Models. Springer, 1999.\n[16] D. L. Alspach and H. W. Sorenson. Nonlinear bayesian estimation using gaussian sum approximations.\n\nIEEE Transactions on Automatic Control, 17(4):439\u2013448, 1972.\n\n[17] X. Boyen and D. Koller. Tractable inference for complex stochastic processes. In Proceedings of the 14th\nConference on Uncertainty in Arti\ufb01cial Intelligence - UAI, volume 17, pages 33\u201342. Morgan Kaufmann,\n1998.\n\n[18] T. Minka. A Family of Algorithms for Approximate Bayesian Inference. PhD Thesis, MIT Media Lab,\n\n2001.\n\n[19] D. Barber. Expectation correction for smoothed inference in switching linear dynamical systems. Journal\n\nof Machine Learning Research, 7:2515\u20132540, 2006.\n\n[20] A. J. Viterbi. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm.\n\nIEEE Transactions on Information Theory, IT-13:260\u2013267, 1967.\n\n[21] N. A. Thacker and P. A. Bromiley. The effects of a square root transform on a poisson distributed quantity.\n\nTechnical Report 2001-010, University of Manchester, 2001.\n\n[22] A. P. Georgopoulos, A. B. Schwartz, and R. E. Ketiner. Neuronal population coding of movement direc-\n\ntion. Science, 233:1416\u20131419, 1986.\n\n[23] W. Erlhagen and G. Schoner. Dynamic \ufb01eld theory of movement preparation. Psychol Rev, 109:545\u2013572,\n\n2002.\n\n9\n\n\f", "award": [], "sourceid": 507, "authors": [{"given_name": "Biljana", "family_name": "Petreska", "institution": null}, {"given_name": "Byron", "family_name": "Yu", "institution": null}, {"given_name": "John", "family_name": "Cunningham", "institution": null}, {"given_name": "Gopal", "family_name": "Santhanam", "institution": null}, {"given_name": "Stephen", "family_name": "Ryu", "institution": null}, {"given_name": "Krishna", "family_name": "Shenoy", "institution": null}, {"given_name": "Maneesh", "family_name": "Sahani", "institution": null}]}