{"title": "Linear dynamical neural population models through nonlinear embeddings", "book": "Advances in Neural Information Processing Systems", "page_first": 163, "page_last": 171, "abstract": "A body of recent work in modeling neural activity focuses on recovering low- dimensional latent features that capture the statistical structure of large-scale neural populations. Most such approaches have focused on linear generative models, where inference is computationally tractable. Here, we propose fLDS, a general class of nonlinear generative models that permits the firing rate of each neuron to vary as an arbitrary smooth function of a latent, linear dynamical state. This extra flexibility allows the model to capture a richer set of neural variability than a purely linear model, but retains an easily visualizable low-dimensional latent space. To fit this class of non-conjugate models we propose a variational inference scheme, along with a novel approximate posterior capable of capturing rich temporal correlations across time. We show that our techniques permit inference in a wide class of generative models.We also show in application to two neural datasets that, compared to state-of-the-art neural population models, fLDS captures a much larger proportion of neural variability with a small number of latent dimensions, providing superior predictive performance and interpretability.", "full_text": "Linear dynamical neural population models through\n\nnonlinear embeddings\n\nYuanjun Gao\u21e4 1 , Evan Archer\u21e412, Liam Paninski12, John P. Cunningham12\n\nDepartment of Statistics1 and Grossman Center2\n\nColumbia University\n\nNew York, NY, United States\n\nyg2312@columbia.edu, evan@stat.columbia.edu,\nliam@stat.columbia.edu, jpc2181@columbia.edu\n\nAbstract\n\nA body of recent work in modeling neural activity focuses on recovering low-\ndimensional latent features that capture the statistical structure of large-scale neural\npopulations. Most such approaches have focused on linear generative models,\nwhere inference is computationally tractable. Here, we propose fLDS, a general\nclass of nonlinear generative models that permits the \ufb01ring rate of each neuron\nto vary as an arbitrary smooth function of a latent, linear dynamical state. This\nextra \ufb02exibility allows the model to capture a richer set of neural variability than\na purely linear model, but retains an easily visualizable low-dimensional latent\nspace. To \ufb01t this class of non-conjugate models we propose a variational inference\nscheme, along with a novel approximate posterior capable of capturing rich tem-\nporal correlations across time. We show that our techniques permit inference in a\nwide class of generative models.We also show in application to two neural datasets\nthat, compared to state-of-the-art neural population models, fLDS captures a much\nlarger proportion of neural variability with a small number of latent dimensions,\nproviding superior predictive performance and interpretability.\n\n1\n\nIntroduction\n\nUntil recently, neural data analysis techniques focused primarily upon the analysis of single neurons\nand small populations. However, new experimental techniques enable the simultaneous recording\nof ever-larger neural populations (at present, hundreds to tens of thousands of neurons). Access to\nthese high-dimensional data has spurred a search for new statistical methods. One recent approach\nhas focused on extracting latent, low-dimensional dynamical trajectories that describe the activity\nof an entire population [1, 2, 3]. The resulting models and techniques permit tractable analysis and\nvisualization of high-dimensional neural data. Further, applications to motor cortex [4] and visual\ncortex [5, 6] suggest that the latent trajectories recovered by these methods can provide insight into\nunderlying neural computations.\nPrevious work for inferring latent trajectories has considered models with a latent linear dynamics\nthat couple with observations either linearly, or through a restricted nonlinearity [1, 3, 7]. When\nthe true data generating process is nonlinear (for example, when neurons respond nonlinearly to\na common, low-dimensional unobserved stimulus), the observation may lie in a low-dimensional\nnonlinear subspace that can not be captured using a mismatched observation model, hampering\nthe ability of latent linear models to recover the low-dimensional structure from the data. Here,\nwe propose fLDS, a new approach to inferring latent neural trajectories that generalizes several\npreviously proposed methods. As in previous methods, we model a latent dynamical state with a\n\n\u21e4These authors contributed equally.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\flinear dynamical system (LDS) prior. But, under our model, each neuron\u2019s spike rate is permitted to\nvary as an arbitrary smooth nonlinear function of the latent state. By permitting each cell to express\nits own, private non-linear response properties, our approach seeks to \ufb01nd a nonlinear embedding of\na neural time series into a linear-dynamical state space.\nTo perform inference in this nonlinear model we adapt recent advances in variational inference\n[8, 9, 10]. Using a novel approximate posterior that is capable of capturing rich correlation structure\nin time, our techniques can be applied to a large class of latent-LDS models. We show that our\nvariational inference approach, when applied to learn generative models that predominate in the neural\ndata analysis literature, performs comparably to inference techniques designed for a speci\ufb01c model.\nMore interestingly, we show in both simulation and application to two neural datasets that our fLDS\nmodeling framework yields higher prediction performance with a more compact and informative\nlatent representation, as compared to state-of-the-art neural population models.\n\n2 Notation and overview of neural data\nNeuronal signals take the form of temporally fast (\u21e0 1 ms) spikes that are typically modeled as\ndiscrete events. Although the spiking response of individual neurons has been the focus of intense\nresearch, modern experimental techniques make it possible to study the simultaneous activity of large\nnumbers of neurons. In real data analysis, we usually discretize time into small bins of duration t\nand represent the response of a population of n neurons at time t by a vector xt of length n, whose\nith entry represents number of spikes recorded from neuron i in time bin t, where i 2 {1, . . . , n},\nt 2 {1, . . . , T}. Additionally, because spike responses are variable even under identical experimental\nconditions, it is commonplace to record many repeated trials, r 2 {1, . . . , R}, of the same experiment.\nHere, we denote xrt = (xrt1, ..., xrtn)> 2 Nn as spike counts of n neurons for time t and trial r.\nWhen the time index is suppressed, we refer to a data matrix xr = (xr1, ..., xrT ) 2 NT\u21e5n. We also\nuse x = (x1, ..., xR) 2 NT\u21e5n\u21e5R to denote all the observations. We use analogous notation for other\ntemporal variables; for instance zr and z.\n\n3 Review of latent LDS neural population models\n\nLatent factor models are popular tools in neural data analysis, where they are used to infer low-\ndimensional, time-evolving latent trajectories (or factors) zrt 2 Rm, m \u2327 n that capture a large\nproportion of the variability present in a neural population recording. Many recent techniques follow\nthis general approach, with distinct noise models [3], different priors on the latent factors [11, 12],\nextra model structure [13] and so on.\nWe focus upon one thread of this literature that takes its inspiration directly from the classical\nKalman \ufb01lter. Under this approach, the dynamics of a population of n neurons are modulated by\nan unobserved, linear dynamical system (LDS) with an m-dimensional latent state zrt that evolves\naccording to,\n\n(1)\n(2)\nwhere A is an m \u21e5 m linear dynamics matrix, and the matrices Q1 and Q are the covariances of the\ninitial states and Gaussian innovation noise, respectively. The spike count observation is then related\nto the latent state via an observation model,\n\nzr(t+1)|zrt \u21e0 N (Azrt, Q),\n\nzr1 \u21e0 N (\u00b51, Q1)\n\nxrti|zrt \u21e0 P (rti = [f (zrt)]i) .\n\n(3)\nwhere [f (zrt)]i is the ith element of a deterministic \u201crate\u201d function f (zrt) : Rm ! Rn, and P()\nis a noise model with parameter . Each choice among the ingredients f and P leads to a model\nwith distinct characteristics. When P is a Gaussian distribution with mean parameter  and linear\nrate function f, the model reduces to the classical Kalman \ufb01lter. All operations in the Kalman \ufb01lter\nare conjugate, and inference may be performed in closed form. However, any non-Gaussian noise\nmodel P or nonlinear rate function f breaks conjugacy and necessitates the use of approximate\ninference techniques. This is generally the case for neural models, where the discrete, positive nature\nof spikes suggests the use of discrete noise models with positive link[1, 3].\n\n2\n\n\fExamples of latent LDS models for neural populations: Existing LDS models usually impose\nstrong assumptions on the rate function. When P is chosen to be Poisson with f (zrt) to be the\n(element-wise) exponential of a linear transformation of zrt, we recover the Poisson linear dynamical\nsystem model (PLDS)[1],\n\nxrti|zrt \u21e0 Poisson (rti = exp(cizrt + di)) ,\n\n(4)\nwhere ci is the ith row of the n \u21e5 m observation matrix C and di 2 R is the baseline \ufb01ring rate of\nneuron i. With P chosen to be a generalized count (GC) distribution and linear rate f, the model is\ncalled the generalized count linear dynamical system (GCLDS) [3],\n(5)\nxrti|zt \u21e0 GC (rti = cizrt, gi(\u00b7)) .\nwhere GC(, g(\u00b7)) is a distribution family parameterized by  2 R and a function g(\u00b7) : N ! R,\ndistributed as,\n(6)\n\npGC(k; , g(\u00b7)) =\n\nexp(k+g(k))\n\nk!\n\nexp(k + g(k))\nk!M (, g(\u00b7))\n\n. k 2 N\n\nwhere M (, g(\u00b7)) =P1k=0\n\ncapture under- and over-dispersed count distributions.\n\nis the normalizing constant. The GC model can \ufb02exibly\n\n4 Nonlinear latent variable models for neural populations\n\n4.1 Generative Model: Linear dynamical system with nonlinear observation\nWe relax the linear assumptions of the previous LDS-based neural population models by incorporating\na per-neuron rate function. We retain the latent LDS of eq. 1 and eq. 2, but select an observation\nmodel such that each neuron has a separate nonlinear dependence upon the latent variable,\n\nxrti|zrt \u21e0 P (rti = [f (zrt)]i) ,\n\n(7)\nwhere P() is a noise model with parameter ; f : Rm ! Rn is an arbitrary continuous function\nfrom the latent state into the spike rate; and [f (zrt)]i is the ith element of f (zrt). In principle,\nthe rate functions may be represented using any technique for function approximation. Here, we\nrepresent f (\u00b7) through a feed-forward neural network model. The parameters then amount to the\nweights and biases of all units across all layers. For the remainder of the text, we use \u2713 to denote all\ngenerative model parameters: \u2713 = (\u00b51, Q1, A, Q, ). We refer to this class of models as fLDS.\nTo refer to an fLDS with a given noise model P, we prepend the noise model to the acronym. In the\nexperiments, we will consider both PfLDS (taking P to be Poisson) and GCfLDS (taking P to be a\ngeneralized count distribution).\n\n4.2 Model Fitting: Auto-encoding variational Bayes (AEVB)\nOur goal is to learn the model parameters \u2713 and to infer the posterior distribution over the la-\ntent variables z. Ideally, we would perform maximum likelihood estimation on the parameters,\n\nr=1R p\u2713(xr, zr)dzr, and compute the posterior p\u02c6\u2713(z|x).\n\u02c6\u2713 = arg max\u2713 log p\u2713(x) = arg max\u2713PR\nHowever, under a fLDS neither the p\u2713(z|x) nor p\u2713(x) are computationally tractable (both due to\nthe noise model P and the nonlinear observation model f (\u00b7)). As a result, we pursue a stochastic\nvariational inference approach to simultaneously learn parameters \u2713 and infer the distribution of z.\nThe strategy of variational inference is to approximate the intractable posterior distribution p\u2713(z|x) by\na tractable distribution q(z|x), which carries its own parameters .2 With an approximate posterior3\nin hand, we learn both p\u2713(z, x) and q(z|x) simultanously by maximizing the evidence lower bound\n(ELBO) of the marginal log likelihood:\n\nlog p\u2713(x)  L(\u2713, ; x) =\n\nL(\u2713, ; xr) =\n\nRXr=1\n\nEq(zr|xr)\uf8fflog\n\np\u2713(xr, zr)\n\nq(zr|xr) .\n\n(8)\n\nRXr=1\n\nfor variational inference.\n\n2Here, we consider a posterior q(z|x) that is conditioned explicitly upon x. However, this is not necessary\n3The approximate posterior is also sometimes called a \u201crecognition model\u201d.\n\n3\n\n\fWe optimize L(\u2713, ; x) by stochastic gradient ascent, using a Monte Carlo estimate of the gradient\nrL. It is well-documented that Monte Carlo estimates of rL are typically of very high variance,\nand strategies for variance reduction are an active area of research [14, 15].\nHere, we take an auto-encoding variational Bayes (AEVB) approach [8, 9, 10] to estimate rL. In\nAEVB, we choose an easy-to-sample random variable \u270f \u21e0 p(\u270f) and sample z through a transformation\nof random sample \u270f parameterized by observations x and parameters : z = h(x, \u270f) to get a rich\nset of variational distributions q(z|x). We then use the unbiased gradient estimator on minibatches\nconsisting of a randomly selected single trials xr,\n\nrL(\u2713, ; x) \u21e1 RrL(\u2713, ; xr)\n\n\u21e1 R\" 1\n\nL\n\nLXl=1\n\nr log p\u2713(xr, h(xr, \u270fl))  rEq(zr|xr) [log q(zr|xr)]# ,\n\n(9)\n\n(10)\n\nwhere \u270fl are iid samples from p(\u270f). In practice, we evaluate the gradient in eq. 9 using a single sample\nfrom p(\u270f) (L = 1) and use ADADELTA for stochastic optimization [16].\n\nChoice of approximate posterior q(z|x): The AEVB approach to inference is appealing in\nits generality: it is well-de\ufb01ned for a large class of generative models p\u2713(x, z) and approximate\nposteriors q(z|x). In practice, however, the performance of the algorithm has a strong dependence\nupon the particular structure of these models. In our case, we use an approximate posterior that is\ndesigned explicitly to parameterize a temporally correlated approximate posterior [17]. We use a\nGaussian approximate posterior,\n\nq(zr|xr) = N (\u00b5(xr), \u2303(xr)) ,\n\n(11)\nwhere \u00b5(xr) is a mT \u21e5 1 mean vector and \u2303(xr) is a mT \u21e5 mT covariance matrix. Both \u00b5(xr)\nand \u2303(xr) are parameterized by observations x through a structured neural network, as described\nin detail in supplementary material. We can sample from this approximate by setting p(\u270f) \u21e0 N (0, I)\nand h(\u270f; x) = \u00b5(x) + \u23031/2\nThis approach is similar to that of [8], except that we impose a block-tridiagonal structure upon\nthe precision matrix \u23031 (rather than a diagonal covariance), which can express rich temporal\ncorrelations across time (essential for the posterior to capture the smooth, correlated trajectories\ntypical of LDS posteriors), while remaining tractable with a computational complexity that scales\nlinearly with T , the length of a trial.\n\nis the Cholesky decomposition of \u2303.\n\n (xr)\u270f , where \u23031/2\n\n\n\n5 Experiments\n\n5.1 Simulation experiments\nLinear dynamical system models with shared, \ufb01xed rate function: Our AEVB approach in\nprinciple permits inference in any latent LDS model. To illustrate this \ufb02exibility, we simulate\n3 datasets from previously-proposed models of neural responses. In our simulations, each data-\ngenerating model has a latent LDS state of m = 2 dimensions, as described by eq. 1 and eq. 2. In all\ndata-generating models, spike rates depend on the latent state variable through a \ufb01xed link function f\nthat is common across neurons. Each data-generating model has a distinct observation model (eq. 3):\nBernoulli (logistic link), Poisson (exponential link), or negative-binomial (exponential link).\nWe compare PLDS and GCLDS model \ufb01ts to each datasets, using both our AEVB algorithm and two\nEM-based inference algorithms: LapEM (which approximates p(z|x) with a multivariate Gaussian\nby Laplace approximation in the E-step [1, 3]) and VBDual (which approximates p(z|x) with a\nmultivariate Gaussian by variational inference, through optimization in the dual space [18, 3]).\nAdditionally, we \ufb01t PfLDS and GCfLDS models with the AEVB algorithm. On this linear simulated\ndata we do not expect these nonlinear techniques to outperform linear methods. In all simulation\nstudies we generate 20 training trials and 20 testing trials, with 100 simulated neurons and 200 time\nbins for each trial. Results are averaged across 10 repeats.\nWe compare the predictive performance and running times of the algorithms in Table 1. For both\nPLDS and GCLDS, our AEVB algorithm gives results comparable to, though slightly worse than, the\n\n4\n\n\fTable 1: Simulation results with a linear observation model: Each column contains results for a\ndistinct experiment, where the true data-generating distribution was either Bernoulli, Poisson or\nNegative-binomial. For each generative model and inference algorithm (one per row), we report the\npredictive log likelihood (PLL) and computation time (in minutes) of the model \ufb01t to each dataset.\nWe report the PLL (divided by number of observations) on test data, using one-step-ahead prediction.\nWhen training a model using the AEVB algorithm, we run 500 epochs before stopping. For LapEM\nand VBDual, we initialize with nuclear norm minimization [2] and stop either after 200 iterations or\nwhen the ELBO (scaled by number of time bins) increases by less than \u270f = 109 after one iteration.\n\nModel\n\nPLDS\n\nInference\nLapEM\nVBDual\nAEVB\nAEVB\nLapEM\nVBDual\nAEVB\nGCfLDS AEVB\n\nGCLDS\n\nPfLDS\n\nBernoulli\n\nPoisson\n\nPLL\n-0.446\n-0.446\n-0.445\n-0.445\n-0.389\n-0.389\n-0.390\n-0.390\n\nTime\n\n3\n157\n50\n56\n40\n131\n69\n72\n\nPLL\n-0.385\n-0.385\n-0.387\n-0.387\n-0.385\n-0.385\n-0.386\n-0.386\n\nTime\n\n5\n170\n55\n58\n97\n126\n75\n76\n\nTime\n\nNegative-binomial\nPLL\n-0.359\n-0.359\n-0.363\n-0.362\n-0.359\n-0.359\n-0.361\n-0.361\n\n5\n138\n53\n50\n101\n127\n73\n68\n\nLapEM and VBEM algorithms. Although PfLDS and GCfLDS assume a much more complicated\ngenerative model, both provide comparable predictive performance and running time. We note that\nwhile LapEM is competitive in running time in this relatively small-data setting, the AEVB algorithm\nmay be more desirable in a large data setting, where it can learn model parameters even before seeing\nthe full dataset. In constrast, both LapEM and VBDual require a full pass through the data in the\nE-step before the M-step parameter updates. The recognition model used by AEVB can also be used\nto initialize the LapEM and VBEM in the linear LDS cases.\n\nSimulation with \u201cgrid cell\u201d type response: A grid cell is a type of neuron that is activated when\nan animal occupies any vertex of a grid spanning the environment [19]. When an animal moves\nalong a one-dimensional line in the space, grid cells exhibit oscillatory responses. Motivated by the\nresponse properties of grid cells, we simulated a population of 100 spiking neurons with oscillatory\nlink functions and a shared, one-dimensional input zrt 2 R given by,\n\n(12)\n(13)\nThe log \ufb01ring rate of each neuron, indexed by i, is coupled to the latent variable zrt through a sinusoid\nwith a neuron-speci\ufb01c phase i and frequency !i\n\nzr(t+1) \u21e0 N (0.99zrt, 0.01).\n\nzr1 = 0,\n\nxrti \u21e0 Poisson (rit = exp(2 sin(!izrt + i)  2)) .\n\n(14)\nWe generated i uniformly at random in the region [0, 2\u21e1] and set !i = 1 for neurons with index\ni \uf8ff 50 and !i = 3 for neurons with index i > 50. We simulated 150 training and 20 testing trials,\neach with T = 120 time bins. We repeated this simulated experiment 10 times.\nWe compare performance of PLDS with PfLDS, both with a 1-dimensional latent variable. As\nshown in Figure 1, PLDS is not able to adapt to the nonlinear and non-monotonic link function, and\ncannot recover the true latent variable (left panel and bottom right panel) or spike rate (upper right\npanel). On the other hand the PfLDS model captures the nonlinearity well, recovering the true latent\ntrajectory. The one-step-ahead predictive log likelihood (PLL) on a held-out dataset for PLDS is\n-0.622 (se=0.006), for PfLDS is -0.581 (se=0.006). A paired t-test for PLL is signi\ufb01cant (p < 106).\n\n5.2 Applications to experimentally-recorded neural data\n\nWe analyze two multi-neuron spike-train datasets, recorded from primary visual cortex and primary\nmotor cortex of the macaque brain, respectively. We \ufb01nd that fLDS models outperform PLDS in terms\nof predictive performance on held out data. Further, we \ufb01nd that the latent trajectories uncovered by\nfLDS are lower-dimensional and more structured than those recovered by PLDS.\n\n5\n\n\fTrue\nPLDS, R2=0.75\nPfLDS, R2=0.98\n\nl\n\ne\nb\na\ni\nr\na\nv\n \nt\n\nn\ne\n\nt\n\na\n\nl\n \n\nd\ne\n\nt\nt\ni\n\nF\n\nTrue latent variable\n\nt\n\ne\na\nr\n \n\ng\nn\ni\nr\ni\nF\n\n1\n\n0.5\n\n0\n\nl\n\ne\nb\na\ni\nr\na\nv\n \nt\n\nn\ne\na\nL\n\nt\n\n0\n\nNeuron #49\n\n-1\n\n0\n\n1\n\n0.5\n\n1 Neuron #50\n\n1.5 Neuron #51\n1\n0.5\n0\n0\nTrue latent variable\n\n-1\n\n-1\n\n0\n\n1\n\n0\n\n1\n\n1.5 Neuron #52\n1\n0.5\n0\n\n-1\n\n0\n\n1\n\nTrue\nPLDS\nPfLDS\n\n20\n\n40\n\n60\nTime\n\n80\n\n100\n\n120\n\nFigure 1: Sample simulation result with \u201cgrid cell\u201d type response. Left panel: Fitted latent variable\ncompared to true latent variable; Upper right panel: Fitted rate compared to the true rate for 4 sample\nneurons; Bottom right panel: Inferred trace of the latent variable compared to true latent trace. Note\nthat the latent trajectory for a 1-dimensional latent variable is identi\ufb01able up to multiplicative constant,\nand here we scale the latent variables to lie between 0 and 1.\n\nMacaque V1 with drifting grating stimulus with single orientation: The dataset consists of\n148 neurons simultaneously recorded from the primary visual cortex (area V1) of an anesthetized\nmacaque, as described in [20] (array 5). Data were recorded while the monkey watched a 1280ms\nmovie of a sinusoidal grating drifting in one of 72 orientations: (0, 5, 10,...). Each of the 72\norientations was repeated R = 50 times. We analyze the spike activity from 300ms to 1200ms\nafter stimulus onset. We discretize the data at t = 10ms, resulting in T = 90 timepoints per trial.\nFollowing [20], we consider the 63 neurons with well-behaved tuning-curves. We performed both\nsingle-orientation and whole-dataset analysis.\nWe \ufb01rst use 12 equal spaced grating orientations (0, 30, 60,...) and analyze each orientation\nseparately. To increase sample size, for each orientation we pool data from the 2 neighboring\norientations (e.g. for orientation 0, we include data from orientation 5and 355), thereby getting\n150 trials for each dataset (we \ufb01nd similar, but more variable, results when we do not include\nneighboring orientations). For each orientation, we divide the data into 120 training trials and 30\ntesting trials. For PfLDS we further divide the 120 training trials into 110 trials for \ufb01tting and 10\ntrials for validation (we use the ELBO on validation set to determine when to stop training). We do\nnot include a stimulus model, but rather perform unsupervised learning to recover a low-dimensional\nrepresentation that combines both internal and stimulus-driven dynamics.\nWe take orientation 0as an example (the other orientations exhibit a similar pattern) and compare\nthe \ufb01tted result of PLDS and PfLDS with a 2-dimensional latent space, which should in principle\nadequately capture the oscillatory pattern of the neural responses. We \ufb01nd that PfLDS is able to\ncapture the nonlinear response charateristics of V1 complex cells (Fig. 2(a), black line), while\nPLDS can only reliably capture linear responses (Fig. 2(a), blue line). In Fig. 2(b)(c) we project\nall trajectories onto the 2-dimensional latent manifold described by the PfLDS. We \ufb01nd that both\ntechniques recover a manifold that reveals the rotational structure of the data; however, by offsetting\nthe nonlinear features of the data into the observation model, PfLDS recovers a much cleaner latent\nrepresentation(Fig. 2(c)).\nWe assess the model \ufb01tting quality by one-step-ahead prediction on a held-out dataset; we compare\nboth percentage mean squared error (MSE) reduction and negative predictive log likelihood (NLL)\nreduction. We \ufb01nd that PfLDS recovers more compact representations than the PLDS, for the same\nperformance in MSE and NLL. We illustrate this in Fig. 2(d)(e), where PLDS requires approximately\n10 latent dimensions to obtain the same predictive performance as an PfLDS with 3 latent dimensions.\nThis result makes intuitive sense: during the stimulus-driven portion of the experiment, neural activity\nis driven primarily by a low-dimensional, oscillatory stimulus drive (the drifting grating). We \ufb01nd\nthat the highly nonlinear generative models used by PfLDS lead to lower-dimensional and hence\nmore interpretable latent-variable representations.\nTo compare the performance of PLDS and PfLDS on the whole dataset, we use 10 trials from each\nof the 72 grating orientations (720 trials in total) as a training set, and 1 trial from each orientation\n\n6\n\n\f(a)\n\n100\n\n50\n\n0\n\n100\n\n50\n\n0\n\n100\n\n50\n\ni\n\n)\ns\n/\ne\nk\np\ns\n(\n \ne\n\nt\n\na\nr\n \n\ng\nn\ni\nr\ni\nF\n\nNeuron #115\n\nTrue\nPLDS\nPfLDS\n\nNeuron #145\n\n0\n300\n1200\nTime after stimulus onset (ms)\n\n600\n\n900\n\nNeuron #77\n\n(b) PLDS\n\n(c) PfLDS\n\nTime after stimulus onset (ms)\n300\n1200\n\n600\n\n900\n\n(d)\n15\n\nn\no\n\ni\nt\nc\nu\nd\ne\nr\n \n\nE\nS\nM\n%\n\n \n\n10\n\n5\n\n0\n\nn\no\n\n(e)\n20\n15\n10\n5\n0\n\ni\nt\nc\nu\nd\ne\nr\n \nL\nL\nN\n%\n\n \n\n10\n\n4\n\n2\nLatent dimensionality\n\n6\n\n8\n\n4\n\n2\nLatent dimensionality\n\n8\n\n6\n\n10\n\nFigure 2: Results for \ufb01ts to Macaque V1 data (single orientation) (a) Comparing true \ufb01ring rate (black\nline) with \ufb01tted rate from PLDS (blue) and PfLDS (red) with 2 dimensional latent space for selected\nneurons (orientation 0, averaged across all 120 training trials); (b)(c) 2D latent-space embeddings of\n10 sample training trials, color denotes phase of the grating stimulus (orientation 0); (d)(e) Predictive\nmean square error (MSE) and predictive negative log likelihood (NLL) reduction with one-step-ahead\nprediction, compared to a baseline model (homogeneous Poisson process). Results are averaged\nacross 12 orientations.\n\nas a test set. For PfLDS we further divide the 720 trials into 648 for \ufb01tting and 72 for validation.\nWe observe in Fig. 3(a)(b) that PfLDS again provides much better predictive performance with a\nsmall number of latent dimensions. We also \ufb01nd that for PfLDS with 4 latent dimensions, when we\nprojected the observation into the latent space and take the \ufb01rst 3 principal components, the trajectory\nforms a torus (Fig. 3(c)). Once again, this result has an intuitive appeal: just as the sinusoidal stimuli\n(for a \ufb01xed orientation, across time) are naturally embedded into a 2D ring, stimulus variation in\norientation (at a \ufb01xed time) also has a natural circular symmetry. Taken together, the stimulus has\na natural toroidal topology. We \ufb01nd that fLDS is capable of uncovering this latent structure, even\nwithout any prior knowledge of the stimulus structure.\n\n(a)\n15\n\n10\n\n5\n\n0\n\n4\n\n2\nLatent dimensionality\n\n6\n\n8\n\nn\no\ni\nt\nc\nu\nd\ne\nr\n \n\nE\nS\nM\n%\n\n \n\n(b)\n20\n15\n10\n5\n0\n\nn\no\ni\nt\nc\nu\nd\ne\nr\n \nL\nL\nN\n%\n\n \n\n10\n\n(c)\n\n)\ne\ne\nr\ng\ne\nd\n(\n \n\nn\no\n\ni\nt\n\nt\n\na\nn\ne\ni\nr\no\n \n\ng\nn\n\ni\nt\n\na\nr\nG\n\n150\n100\n50\n0\n\n10\n\n500ms after stimulus onset\n\nPLDS\nPfLDS\n\n4\n\n2\nLatent dimensionality\n\n6\n\n8\n\nFigure 3: Macaque V1 data \ufb01tting result (full data) (a)(b) Predictive MSE and NLL reduction. (c) 3D\nembedding of the mean latent trajectory of the neuron activity during 300ms to 500ms after stimulus\nonset across grating orientations 0, 5, ..., 175, here we use PfLDS with 4 latent dimensions and\nthen project the result on the \ufb01rst 3 principal components. A video for the 3D embedding can be\nfound at https://www.dropbox.com/s/cluev4fzfsob4q9/video_fLDS.mp4?dl=0\n\nMacaque center-out reaching data: We analyzed the neural population data recorded from the\nmacaque motor cortex(G20040123), details of which can be found in [11, 1]. Brie\ufb02y, the data consist\nof simultaneous recordings of 105 neurons for 56 cued reaches from the center of a screen to 14\nperipheral targets. We analyze the reaching period (50ms before and 370ms after movement onset)\nfor each trial. We discretize the data at t = 20ms, resulting in T = 21 timepoints per trial. For\neach target we use 50 training trials and 6 testing trials and \ufb01t all the 14 reaching targets together\n(making 700 training trials and 84 testing trials). We use both Poisson and GC noise models, as GC\n\n7\n\n\fhas the \ufb02exibility to capture the noted under-dispersion of the data [3]. We compare both PLDS and\nPfLDS as well as GCLDS and GCfLDS \ufb01ts. For both PfLDS and GCfLDS we further divide the\ntraining trials into 630 for \ufb01tting and 70 for validation.\nAs is shown in \ufb01gure Fig. 4(d), PfLDS and GCfLDS with latent dimension 2 or 3 outperforms their\nlinear counterparts with much larger latent dimensions. We also \ufb01nd that GCLDS and GCfLDS\nmodels give much better predictive likelihood than their Poisson counterparts. On \ufb01gure Fig. 4(b)(c)\nwe project the neural activities on the 2 dimensional latent space. We \ufb01nd that PfLDS (Fig. 4(c))\nclearly separates the reaching trajectories and orders them in exact correspondence with the true\nspatial location of the targets.\n\n(a)Reaching trajectory\n\n(b) PLDS\n\n(c) PfLDS\n\n(d)\n\nn\no\n\ni\nt\nc\nu\nd\ne\nr\n \nL\nL\nN\n%\n\n \n\n12\n10\n8\n6\n4\n\nPLDS\nPfLDS\nGCLDS\nGCfLDS\n\n4\n\n2\n8\nLatent dimensionality\n\n6\n\nFigure 4: Macaque center-out reaching data analysis: (a) 5 sample reaching trajectory for each of\nthe 14 target locations. Directions are coded by different color, and distances are coded by different\nmarker size; (b)(c) 2D embeddings of neuron activity extracted by PLDS and PfLDS, circles represent\n50ms before movement onset and triangles represent 340ms after movement onset. Here 5 training\nreaches for each target location are plotted; (d) Predictive negative log likelihood (NLL) reduction\nwith one-step-ahead prediction.\n\n6 Discussion and Conclusion\n\nWe have proposed fLDS, a modeling framework for high-dimensional neural population data that\nextends previous latent, low-dimensional linear dynamical system models with a \ufb02exible, nonlinear\nobservation model. Additionally, we described an ef\ufb01cient variational inference algorithm suitable\nfor \ufb01tting a broad class of LDS models \u2013 including several previously-proposed models. We illustrate\nin both simulation and application to real data that, even when a neural population is modulated by a\nlow-dimensional linear dynamics, a latent variable model with a linear rate function fails to capture\nthe true low-dimensional structure. In constrast, a fLDS can recover the low-dimensional structure,\nproviding better predictive performance and more interpretable latent-variable representations.\n[21] extends the linear Kalman \ufb01lter by using neural network models to parameterize both the dynamic\nequation and the observation equation, they uses RNN based recognition model for inference. [22]\ncomposes graphical models with neural network observations and proposes structured auto encoder\nvariational inference algorithm for inference. Ours focus on modeling count observations for neural\nspike train data, which is orthogonal to the papers mentioned above.\nOur approach is distinct from related manifold learning methods [23, 24]. While most manifold\nlearning techniques rely primarily on the notion of nearest neighbors, we exploit the temporal structure\nof the data by imposing strong prior assumption about the dynamics of our latent space. Further, in\ncontrast to most manifold learning approaches, our approach includes an explicit generative model\nthat lends itself naturally to inference and prediction, and allows for count-valued observations that\naccount for the discrete nature of neural data.\nFuture work includes relaxing the latent linear dynamical system assumption to incorporate more\n\ufb02exible latent dynamics (for example, by using a Gaussian process prior [12] or by incorporating a\nnonlinear dynamical phase space [25]). We also anticipate our approach may be useful in applications\nto neural decoding and prosthetics: once trained, our approximate posterior may be evaluated in close\nto real-time.\nA Python/Theano [26, 27] implementation of our algorithms is available at http://github.com/\nearcher/vilds.\n\n8\n\n\fReferences\n[1] J. H. Macke, L. Buesing, J. P. Cunningham, B. M. Yu, K. V. Shenoy, and M. Sahani, \u201cEmpirical models of\n\nspiking in neural populations,\u201d in NIPS, pp. 1350\u20131358, 2011.\n\n[2] D. Pfau, E. A. Pnevmatikakis, and L. Paninski, \u201cRobust learning of low-dimensional dynamics from large\n\nneural ensembles,\u201d in NIPS, pp. 2391\u20132399, 2013.\n\n[3] Y. Gao, L. Busing, K. V. Shenoy, and J. P. Cunningham, \u201cHigh-dimensional neural spike train analysis\n\nwith generalized count linear dynamical systems,\u201d in NIPS, pp. 2035\u20132043, 2015.\n\n[4] M. M. Churchland, J. P. Cunningham, M. T. Kaufman, J. D. Foster, P. Nuyujukian, S. I. Ryu, and K. V.\n\nShenoy, \u201cNeural population dynamics during reaching,\u201d Nature, vol. 487, no. 7405, pp. 51\u201356, 2012.\n\n[5] R. L. Goris, J. A. Movshon, and E. P. Simoncelli, \u201cPartitioning neuronal variability,\u201d Nature neuroscience,\n\nvol. 17, no. 6, pp. 858\u2013865, 2014.\n\n[6] A. S. Ecker, P. Berens, R. J. Cotton, M. Subramaniyan, G. H. Den\ufb01eld, C. R. Cadwell, S. M. Smirnakis,\nM. Bethge, and A. S. Tolias, \u201cState dependence of noise correlations in macaque primary visual cortex,\u201d\nNeuron, vol. 82, no. 1, pp. 235\u2013248, 2014.\n\n[7] E. W. Archer, U. Koster, J. W. Pillow, and J. H. Macke, \u201cLow-dimensional models of neural population\n\nactivity in sensory cortical circuits,\u201d in NIPS, pp. 343\u2013351, 2014.\n\n[8] D. P. Kingma and M. Welling, \u201cAuto-encoding variational bayes,\u201d arXiv preprint arXiv:1312.6114, 2013.\n[9] M. Titsias and M. L\u00e1zaro-Gredilla, \u201cDoubly stochastic variational bayes for non-conjugate inference,\u201d in\n\nICML, pp. 1971\u20131979, 2014.\n\n[10] D. J. Rezende, S. Mohamed, and D. Wierstra, \u201cStochastic backpropagation and approximate inference in\n\ndeep generative models,\u201d arXiv preprint arXiv:1401.4082, 2014.\n\n[11] B. M. Yu, J. P. Cunningham, G. Santhanam, S. I. Ryu, K. V. Shenoy, and M. Sahani, \u201cGaussian-process\nfactor analysis for low-dimensional single-trial analysis of neural population activity,\u201d Journal of Neuro-\nphysiology, vol. 102, no. 1, pp. 614\u2013635, 2009.\n\n[12] Y. Zhao and I. M. Park, \u201cVariational latent gaussian process for recovering single-trial dynamics from\n\npopulation spike trains,\u201d arXiv preprint arXiv:1604.03053, 2016.\n\n[13] L. Buesing, T. A. Machado, J. P. Cunningham, and L. Paninski, \u201cClustered factor analysis of multineuronal\n\nspike data,\u201d in NIPS, pp. 3500\u20133508, 2014.\n\n[14] Y. Burda, R. Grosse, and R. Salakhutdinov, \u201cImportance weighted autoencoders,\u201d arXiv preprint\n\narXiv:1509.00519, 2015.\n\n[15] R. Ranganath, S. Gerrish, and D. M. Blei, \u201cBlack box variational\n\narXiv:1401.0118, 2013.\n\ninference,\u201d arXiv preprint\n\n[16] M. D. Zeiler, \u201cADADELTA: An adaptive learning rate method,\u201d arXiv preprint arXiv:1212.5701, 2012.\n[17] E. Archer, I. M. Park, L. Buesing, J. Cunningham, and L. Paninski, \u201cBlack box variational inference for\n\nstate space models,\u201d arXiv preprint arXiv:1511.07367, 2015.\n\n[18] M. Emtiyaz Khan, A. Aravkin, M. Friedlander, and M. Seeger, \u201cFast dual variational inference for\n\nnon-conjugate latent gaussian models,\u201d in ICML, pp. 951\u2013959, 2013.\n\n[19] T. Hafting, M. Fyhn, S. Molden, M.-B. Moser, and E. I. Moser, \u201cMicrostructure of a spatial map in the\n\nentorhinal cortex,\u201d Nature, vol. 436, no. 7052, pp. 801\u2013806, 2005.\n\n[20] A. B. Graf, A. Kohn, M. Jazayeri, and J. A. Movshon, \u201cDecoding the activity of neuronal populations in\n\nmacaque primary visual cortex,\u201d Nature neuroscience, vol. 14, no. 2, pp. 239\u2013245, 2011.\n\n[21] R. G. Krishnan, U. Shalit, and D. Sontag, \u201cDeep Kalman \ufb01lters,\u201d arXiv preprint arXiv:1511.05121, 2015.\n[22] M. J. Johnson, D. Duvenaud, A. B. Wiltschko, S. R. Datta, and R. P. Adams, \u201cComposing graphical models\n\nwith neural networks for structured representations and fast inference,\u201d arXiv:1603.06277, 2016.\n\n[23] S. T. Roweis and L. K. Saul, \u201cNonlinear dimensionality reduction by locally linear embedding,\u201d Science,\n\nvol. 290, no. 5500, pp. 2323\u20132326, 2000.\n\n[24] J. B. Tenenbaum, V. De Silva, and J. C. Langford, \u201cA global geometric framework for nonlinear dimen-\n\nsionality reduction,\u201d science, vol. 290, no. 5500, pp. 2319\u20132323, 2000.\n\n[25] R. Frigola, Y. Chen, and C. Rasmussen, \u201cVariational gaussian process state-space models,\u201d in NIPS,\n\npp. 3680\u20133688, 2014.\n\n[26] F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. Goodfellow, A. Bergeron, N. Bouchard, D. Warde-Farley,\nand Y. Bengio, \u201cTheano: new features and speed improvements,\u201d arXiv preprint arXiv:1211.5590, 2012.\n[27] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley,\nand Y. Bengio, \u201cTheano: a CPU and GPU math expression compiler,\u201d in Proceedings of the Python for\nscienti\ufb01c computing conference (SciPy), vol. 4, p. 3, Austin, TX, 2010.\n\n9\n\n\f", "award": [], "sourceid": 128, "authors": [{"given_name": "Yuanjun", "family_name": "Gao", "institution": "Columbia University"}, {"given_name": "Evan", "family_name": "Archer", "institution": "Columbia University"}, {"given_name": "Liam", "family_name": "Paninski", "institution": "Columbia University"}, {"given_name": "John", "family_name": "Cunningham", "institution": "University of Columbia"}]}