{"title": "Empirical models of spiking in neural populations", "book": "Advances in Neural Information Processing Systems", "page_first": 1350, "page_last": 1358, "abstract": "Neurons in the neocortex code and compute as part of a locally interconnected population. Large-scale multi-electrode recording makes it possible to access these population processes empirically by fitting statistical models to unaveraged data. What statistical structure best describes the concurrent spiking of cells within  a local network? We argue that in the cortex, where firing exhibits extensive correlations in both time and space and where a typical sample of neurons still reflects only a very small fraction of the local population, the most appropriate model captures shared variability by a low-dimensional latent process evolving with smooth dynamics, rather than by putative direct coupling. We test this claim by comparing  a latent dynamical model with realistic spiking observations to coupled generalised  linear spike-response models (GLMs) using cortical recordings. We find that the latent dynamical approach outperforms the GLM in terms of goodness-of-fit, and reproduces the temporal correlations in the data more accurately. We also compare models whose observations models are either derived from a Gaussian or point-process models, finding that the non-Gaussian model provides slightly  better goodness-of-fit and more realistic population spike counts.", "full_text": "Empirical models of spiking in neural populations\n\nJakob H. Macke\n\nLars B\u00a8using\n\nGatsby Computational Neuroscience Unit\n\nGatsby Computational Neuroscience Unit\n\nUniversity College London, UK\njakob@gatsby.ucl.ac.uk\n\nUniversity College London, UK\nlars@gatsby.ucl.ac.uk\n\nJohn P. Cunningham\n\nDepartment of Engineering\nUniversity of Cambridge, UK\n\njpc74@cam.ac.uk\n\nByron M. Yu\nECE and BME\n\nCarnegie Mellon University\n\nbyronyu@cmu.edu\n\nKrishna V. Shenoy\n\nDepartment of Electrical Engineering\n\nStanford University\n\nshenoy@stanford.edu\n\nManeesh Sahani\n\nGatsby Computational Neuroscience Unit\n\nUniversity College London, UK\n\nmaneesh@gatsby.ucl.ac.uk\n\nAbstract\n\nNeurons in the neocortex code and compute as part of a locally interconnected\npopulation. Large-scale multi-electrode recording makes it possible to access\nthese population processes empirically by \ufb01tting statistical models to unaveraged\ndata. What statistical structure best describes the concurrent spiking of cells within\na local network? We argue that in the cortex, where \ufb01ring exhibits extensive corre-\nlations in both time and space and where a typical sample of neurons still re\ufb02ects\nonly a very small fraction of the local population, the most appropriate model cap-\ntures shared variability by a low-dimensional latent process evolving with smooth\ndynamics, rather than by putative direct coupling. We test this claim by compar-\ning a latent dynamical model with realistic spiking observations to coupled gen-\neralised linear spike-response models (GLMs) using cortical recordings. We \ufb01nd\nthat the latent dynamical approach outperforms the GLM in terms of goodness-of-\n\ufb01t, and reproduces the temporal correlations in the data more accurately. We also\ncompare models whose observations models are either derived from a Gaussian\nor point-process models, \ufb01nding that the non-Gaussian model provides slightly\nbetter goodness-of-\ufb01t and more realistic population spike counts.\n\n1\n\nIntroduction\n\nMulti-electrode array recording and similar methods provide measurements of activity from dozens\nof neurons simultaneously, and thus allow unprecedented insights into the statistical structure of\nneural population activity. To exploit this potential we need methods that identify the temporal dy-\nnamics of population activity and link it to external stimuli and observed behaviour. These statistical\nmodels of population activity are essential for understanding neural coding at a population level [1]\nand can have practical applications for Brain Machine Interfaces [2].\nTwo frameworks for modelling the temporal dynamics of cortical population recordings have re-\ncently become popular. Generalised Linear spike-response Models (GLMs) [1, 3, 4, 5] model the\nin\ufb02uence of spiking history, external stimuli or other neural signals on the \ufb01ring of a neuron. Here,\nthe interdependence of different neurons is modelled by terms that link the instantaneous \ufb01ring rate\nof each neuron to the recent spiking history of the population. The parameters of the GLM can be\n\n1\n\n\flearned ef\ufb01ciently by convex optimisation [3, 4, 5, 6]. Such models have been successful in a range\nof studies and systems, including retinal [1] and cortical [7] population recordings.\nAn alternative is provided by latent variable models such as Gaussian Process Factor Analysis [8]\nor other state-space models [9, 10, 11]. In this approach, shared variability (or \u2018noise correlation\u2019)\nis modelled by an unobserved process driving the population, which is sometimes characterised as\n\u2018common input\u2019 [12, 13]. One advantage of this approach is that the trajectories of the latent state\nprovide a compact, low-dimensional representation of the population which can be used to visualise\npopulation activity, and link it to observed behaviour [14].\n\n1.1 Comparing coupled generalised linear models and latent variable models\n\nThree lines of argument suggest that latent dynamical models may provide a better \ufb01t to cortical\npopulation data than the spike-response GLM. First, prevalent recording apparatus, such as extra-\ncellular grid electrodes, sample neural populations very sparsely making it unlikely that much of\nthe observed shared variability is a consequence of direct physical interaction. Hence, the coupling\n\ufb01lters of a GLM rather re\ufb02ect statistical interactions (sometimes called functional connectivity).\nWithout direct synaptic coupling, it is unlikely that variability is shared exclusively by particular\npairs of units; instead, it will generally be common to many cells\u2014an assumption explicit in the\nlatent variable approach, where shared variability results from the model of cortical dynamics.\nSecond, most cortical population recordings \ufb01nd that shared variability across neurons is dominated\nby a central peak at zero time lag (i.e. the strongest correlation is instantaneous) [15, 16], and has\nbroad, positive, sometimes asymmetric \ufb02anks, decaying slowly with lag time. Correlations with\nthese properties arise naturally in dynamical system models. The common input from the latent state\ninduces instantaneous correlations, and the evolution of the latent system typically yields positive\ntemporal correlations over moderate timescales. By contrast, GLMs couple instantaneous rate to the\nrecent spiking of other neurons, but not to their simultaneous activity, making zero-lag correlation\nhard to model. (As we show below in \u201cMethods\u201d, the inclusion of simultaneous terms would lead to\ninvalid models.) Instead, the common approach is to discretise time very \ufb01nely so that an off-zero\npeak can be brought close to simultaneity. This increases computational load, and often requires\ndiscretisation \ufb01ner than the time-scale of interest, perhaps even \ufb01ner than the recording resolution\n(e.g. for 2-photon calcium imaging). In addition, positive history coupling in a GLM may lead to\nloops of self-excitation, predicting unrealistically high \ufb01ring rates\u2014a trend that must be countered\nby long-term negative self-coupling. Thus, while it is certainly not impossible to reproduce neural\ncorrelation structure with GLMs [1], they do not seem to be the natural choice for modelling time-\nseries of spike-counts with instantaneous correlations.\nThird, recording time, and therefore the data available to \ufb01t a model, is usually limited in vivo,\nespecially in behaving animals. This paucity of data places strong constraints on the number of\nparameters than can be identi\ufb01ed. In dynamical system models, the parameter count grows linearly\nwith population size (for a constant latent dimension), whereas the parameters of a coupled GLM\ndepend quadratically on the number of neurons. Thus, GLMs may have many more parameters, and\ndepend on aggressive regularisation techniques to avoid over-\ufb01tting to small datasets.\nHere we show that population activity in monkey motor cortex is better \ufb01t by a dynamical system\nmodel than by a spike-response GLM; and that the dynamical system, but not a GLM of the same\ntemporal resolution, accurately reproduces the temporal structure of cross-correlations in these data.\n\n1.2 Comparing dynamical system models with spiking or Gaussian observations\n\nMany studies of population latent variable models assume Gaussian observation noise [8, 17] (but\nsee, e.g. [2, 11, 13, 18]). Given that spikes are discrete events in time, it seems more natural to use\na Poisson [10] or other point-process model [19] or, at coarser timescales, a count-process model.\nHowever, it is unclear what (if any) advantage such more realistic models confer. For example,\nPoisson decoding models do not always outperform Gaussian ones [2, 11]. Here, we describe a\nlatent linear dynamical system whose count distribution, when conditioned on all past observations,\nis Poisson. (the Poisson linear dynamical system or PLDS). Using a co-smoothing metric, we show\nthat this (computationally more expensive) count model predicts spike counts in our data better than\na Gaussian linear dynamical system (GLDS). The two models give substantially different population\nspike-count distributions, and the count approach is also more accurate on this measure than either\nthe GLDS or GLM.\n\n2\n\n\f2 Methods\n\n2.1 Dynamical systems with count observations and time-varying mean rates\n\nE[yi\n\nkt the observed\nWe \ufb01rst consider the count-process latent dynamical model (PLDS). Denote by yi\nspike-count of neuron i \u2208 {1 . . . q} at time bin t \u2208 {1 . . . T} of trial k \u2208 {1 . . . N}, and by yk =\nvec (yk,i=1:q,t=1:T ) the qT \u00d7 1 vector of all data observed on trial k. Neurons are assumed to\nbe conditionally independent given the low-dimensional latent state xkt (of dimensionality p with\np < q). Thus, correlated neural variability arises from variations of this latent population state, and\nnot from direct interaction between neurons. Conditioned on x and the recent spiking history st, the\nactivity of neuron i at time t is given by a Poisson distribution with mean\nkt|skt, xkt] = exp ([Cxkt + d + Dskt]i) ,\n\n(1)\nwhere the q \u00d7 p matrix C determines how each neuron is related to the latent state xkt, and the\nq-dimensional vector d controls the mean \ufb01ring rates of the population. The history term st is a\nvector of all relevant recent spiking in the population [1, 3, 7, 20]. For example, one choice to model\nspike refractoriness would set skt to the counts at the previous time point skt = yk,(t\u22121), and D to\na diagonal matrix of size q \u00d7 q with negative diagonal entries. In general, however, s and D may\ncontain entries that re\ufb02ect temporal dependence on a longer time-scale. However, to maintain the\nconditional independence of neurons given latent state the matrix D (of size q \u00d7 dim(s)) is con-\nstrained to have zero values at all entries corresponding to cross-neuron couplings. The exponential\nnonlinearity ensures that the conditional \ufb01ring rate of each neuron is positive. Furthermore, while\nconditioned on the latent state and the recent spiking history the count in each bin is Poisson dis-\ntributed (hence the model name), samples from the model are not Poisson as they are affected both\nby variations in the underlying state and the single-neuron history.\nWe assume that the latent population state xkt evolves according to driven linear Gaussian dynamics:\n(2)\n(3)\nHere, xo and Qo denote the average value and the covariance of the initial state x1 of each trial. The\np \u00d7 p matrix A speci\ufb01es the deterministic component of the evolution from one state to the next,\nand the matrix Q gives the covariance of the innovations that perturb the latent state at each time\nstep. The \u2018driving inputs\u2019 bt, which add to the latent state, allow the model to capture time-varying\nstructure in the \ufb01ring rates that is consistent across trials. Such time-varying mean \ufb01ring rates are\nusually characterised by the peri-stimulus time histogram (PSTH), which requires q \u00d7 T parameters\nto estimate for each stimulus. Here, by contrast, time-varying means are captured by the driving\ninputs into the latent state, and so only p \u00d7 T parameters are needed to describe all the PSTHs.\n\nxk(t+1)|xkt \u223c N (Axkt + bt, Q)\n\nxk1 \u223c N (xo, Qo)\n\n2.2 Expectation-Maximisation for the PLDS model\n\nWe use an EM algorithm, similar to those described before [10, 11, 12], to learn the parameters\n\u0398 = {C, D, d, A, Q, Qo, xo}. The E-step requires the posterior distribution P (\u00afxk|yk, \u0398) over the\nlatent trajectories \u00afxk = vec (xk,1:T ) given the data and our current estimate of the parameters \u0398.\nAs this distribution is not available in closed-form, we approximate it by a multivariate Gaussian,\nP (\u00afxk|yk, \u0398) \u2248 N (\u00b5k, \u03a3k). As xk is a vector of length pT , so \u00b5k and \u03a3k are of size pT \u00d7 1\nand pT \u00d7 pT , respectively. We \ufb01nd the mean \u00b5k and the covariance \u03a3k of this Gaussian via a\nglobal Laplace approximation [21], i.e. by maximising the log-posterior P (\u00afxk, yk) of each trial\nover xk, setting \u00b5k = argmax\u00afxP (\u00afx|yk, \u0398) to be the latent trajectory that achieves this maximum,\n\u22121 to be the negative inverse Hessian of the log-posterior\nand \u03a3k = \u2212 (\u2207\u2207\u00afx log P (\u00afx|yk, \u0398)|\u00afx=\u00b5k )\nat its maximum. The log-posterior on trial k is given by\n\n(cid:33)\n\nlog P (\u00afxk|yk, \u0398) = const +\n\n(cid:32)\n\nT(cid:88)\n\nt=1\n\ny(cid:62)\n\nkt (Cxkt + Dskt + d) \u2212 q(cid:88)\nT\u22121(cid:88)\n\ni=1\n\nexp [Cxkt + Dskt + d]i\n\n(4)\n\n\u2212 1\n2\n\n(xk1 \u2212 xo)(cid:62)Q\u22121\n\no (xk1 \u2212 xo) \u2212 1\n2\n\nt=1\n\n(xk,t+1 \u2212 Axkt \u2212 bt)(cid:62)Q\u22121(xk,t+1 \u2212 Axkt \u2212 bt)\n\nLog-posteriors of this type are concave and hence unimodal [5, 6], and the Markov structure of the\nlatent dynamics makes it possible to compute a Newton update in O(T ) time [22]. Furthermore, it\n\n3\n\n\fhas previously been observed that the Laplace approximation performs well for similar models with\nPoisson observations [23]. We checked the quality of the Laplace approximation for our parameter\nsettings by drawing samples from the true posterior in a few cases. The agreement was generally\ngood, with only some minor deviations between the approximated and sampled covariances.\nThe M-step requires optimisation of the expected joint log-likelihood with respect to the parameters\n\u0398, i.e. \u0398new = argmax\u0398(cid:48)L(\u0398(cid:48)) with\n\nL(\u0398(cid:48)) =\n\n[log P (yk|x, \u0398(cid:48)) + log P (x|\u0398(cid:48))]N (x|\u00b5k, \u03a3k) dx.\n\n(5)\n\n(cid:90)\n\n(cid:88)\n\nk\n\nThis integral can be evaluated in closed form, and ef\ufb01ciently optimised over the parameters: L(\u0398(cid:48))\nis jointly concave in the parameters C, d, D, and the updates with respect to the dynamics parameters\nA, Q, Qo, xo and the driving inputs bt can be calculated analytically.\nOur use of the Laplace approximation in the E-step breaks the usual guarantee of non-decreasing\nlikelihoods in EM. Furthermore, the full likelihood of the model can only be approximated using\nsampling techniques [11]. We therefore monitored convergence using the leave-one-neuron-out\nprediction score [8] that we also used for comparisons with alternative methods (see below): For\neach trial in the test-set, and for each neuron i, we calculate its most likely \ufb01ring rate given the\nactivity of the other neurons y\u2212i\nk,1:T , and then compared this prediction against the observed activity.\nIf implemented naively, this requires q inferences of the latent state from the activity of q\u22121 neurons.\nHowever, this computation can be sped up by an order of magnitude by \ufb01rst \ufb01nding the most likely\nstate given all neurons, and then performing one Newton-update for each held out neuron from this\ninitial state. While this approximate approach yielded accurate results, we only used it for tracking\nconvergence of the algorithm, not for reporting the results in section 3.1.\n\n2.3 Alternative models: Generalised Linear Models and Gaussian dynamical systems\n\nkti\n\n\u03bbi\n\n(cid:1).\n\nkt\n\nkt|skt\n\nkt = E(cid:0)yi\n\nThe spike-response GLM models the instantaneous rate of neuron i at time t by a generalised linear\nform [4] with input covariates representing stimulus (or time) and population spike history:\n\n(cid:1) = exp ([bt + d + Dskt]i) .\nrameters are estimated by minimising the negative log-likelihood Ldat =(cid:80)\n\n(6)\nThe coupling matrix D describes dependence both on the history of \ufb01ring in the same neuron and\non spiking in other neurons, and the q \u00d7 1 vectors bt model time-varying mean \ufb01ring rates. The pa-\nWhile equation (6) is similar to the de\ufb01nition of the PLDS model in equation (1), the models differ\nin their treatment of shared variability: The GLM has no latent state xt and so shared variance is\nmodelled through the cross-coupling terms of the matrix D, which are set to 0 in the PLDS.\nAs the number of parameters in the GLM is quadratic in population size, it may be prone to over-\n\ufb01tting on small datasets. To improve the generalisation ability of the GLM we added a sparsity-\ninducing L1 prior on the coupling parameters and a smoothness prior on the PSTH parameters bt,\nand minimized the (convex) cost function using methods described in [24]:\nb(cid:62)\nt K\u22121\n\nL(b, d, D) = Ldat + \u03b71\n\nkt \u2212 \u03bbi\n\n|Dij| +\n\n(cid:88)\n\n(cid:88)\n\n(cid:0)yi\n\nkt log \u03bbi\n\nij\n\nHere, the regularization parameter \u03b71 determines the sparsity of the solution D, \u03b72 is the prior\n\nvariance of the smoothing prior, and K\u03b73 (t, s) = exp(cid:0)\u2212(s \u2212 t)2/\u03b72\nonly instantaneous, with conditional distributions yit|y\u2212i,t \u223c Poiss(cid:0)D(i,\u2212i)y\u2212i,t\n\nprior on the time-varying \ufb01ring rates bt which ensures their smoothness over time.\nIt is important to note that GLMs with Poisson conditionals cannot easily be extended to allow for\ninstantaneous couplings between neurons. Suppose that we sought a model whose couplings were\n\n(cid:1) is a squared-exponential\n(cid:1). It can be veri-\n\n3\n\nt\n\n\ufb01ed that the model P (y) = 1\ni yi!, which could be regarded as the Poisson equiv-\nalent to the Ising model [25], would provide such a structure (as long as J has a zero diagonal). In\nthis model, P (yit|y\u2212i,t) \u221d exp(yi,t\nj(cid:54)=i Dijyj,t)/yi,t! . One might imagine that the parameters J\ncould be learnt by maximizing each of the conditional likelihoods over a row of J (effectively max-\nimising the pseudo-likelihood), and one could sample counts by Gibbs sampling, again exploiting\nthe fact that the conditional distributions are all Poisson. However, an obvious prerequisite would be\nthat a Z exists for which the model is normalised. Unfortunately, this becomes impossible as soon as\nany entry of J is positive. For example, if entry Jij is positive, then we can easily construct a \ufb01ring\n\nZ exp(cid:0)y(cid:62)Jy(cid:1) /(cid:81)\n\n(cid:80)\n\n1\n2\u03b72\n\nbt.\n\n\u03b73\n\n(7)\n\n4\n\n\fpattern y for which probabilities diverge. Let the pattern y(n) have value n at entries i and j, and\nzeros otherwise. Then, for large n, we \ufb01nd that log P (y(n)) \u221d n2Jij \u2212 2 log(n!), which is dom-\ninated by the quadratic term, and therefore diverges, rendering the model unnormalizeable. Thus,\nthis \u201cPoisson equivalent\u201d of the Ising model cannot model positive interactions between neurons,\nlimiting its value.\nThe Poisson likelihood of the PLDS requires approximation and is computationally cumbersome.\nAn apparently less veridical alternative would be to model counts as conditionally Gaussian given\nthe latent state. We used the EM algorithm [9] to \ufb01t a linear dynamical system model with Gaussian\nnoise and driving inputs [17] (GLDS). In comparison with the Poisson model, the GLDS has an\nadditional set of q parameters corresponding to the variances of the Gaussian observation noise.\nFinally, we also compared PLDS to Gaussian Process Factor Analysis (GPFA) [8], a Gaussian model\nin which the latent trajectories are drawn not from a linear dynamical system, but from a more\ngeneral Gaussian Process with (here) a squared-exponential kernel. We did not include the driving\ninputs bt in this model, and used the full model for co-smoothing, i.e. we did not orthogonalise its\n\ufb01lters as was done in [8].\nWe quanti\ufb01ed goodness-of-\ufb01t using two measures of \u2018leave-one-neuron-out prediction\u2019 accuracy on\ntest data (see [8] for more detail). Each neuron\u2019s \ufb01ring rate was \ufb01rst predicted using the activity\nof all other neurons on each test trial. For the GLM (but not PLDS), predictions reported were\nbased on the past activity of other neurons, but also used the observed past activity of the neuron\nbeing predicted (results exploiting all data from other neurons were similar). Then we calculated\nthe difference between the total variance and the residual variance around this prediction Mi,k =\nvar(yi\nk,1:T , ypred). Here, the predicted \ufb01ring rate is a vector of length T , and the\nvariance is computed over all times t = 1, . . . , T in trial k. Positive values indicate that prediction is\nmore accurate than a constant prediction equal to the true mean activity of that neuron on that trial.\nWe also constructed a receiver operating characteristic (ROC) for deciding based on the predicted\n\ufb01ring rates which bins were likely to contain at least one spike, and measured the area under this\ncurve (AUC) [7, 26]. This measure ranges between 0.5 and 1, with a value of 1 re\ufb02ecting correct\nidenti\ufb01cation of spike-containing bins, even if the predicted number of spikes is incorrect.\n\nk,1:T ) \u2212 MSE(yi\n\n2.4 Details of neural recordings and choice of parameters\n\nWe evaluated the methods described above on multi-electrode recordings from the motor cortex\nof a behaving monkey. The details of the data are described elsewhere [8]. Brie\ufb02y, spikes were\nrecorded with a 96-electrode array (Blackrock, Salt Lake City, UT) implanted into motor areas of a\nrhesus macaque (monkey G) performing a delayed center-out reach task. For the analyses presented\nhere, data came from 108 trials on which the monkey was instructed to reach to one target. We used\n1200 ms of data from each trial, from 200ms before target onset until the cue to move was presented.\nWe included 92 units (single and multi-units) with robust delay activity. Spike trains were binned at\n10ms which resulted in 8.13% of bins containing at least one spike, and in 0.61% of bins containing\nmore than one spike. For goodness-of \ufb01t analyses, we performed 4-fold cross-validation, splitting\nthe data into four non-overlapping test folds with 27 trials each.\nFor the PLDS model, dimensionality of the latent state varied from 1 to 20. Models either had no\ndirect history-dependence (i.e. D = 0), or used spike history mapped to a set of 4 basis functions\nformed by othogonalising decaying exponentials with time constants 0.1, 10, 20, 40ms (similar to\nthose used in [1]). The history term st was then obtained by projecting spike counts in the previous\n100ms onto each of these functions. The exponential with 0.1ms time constant effectively covered\nonly the previous time bin and was thus able to model refractoriness. In this case, D was of size\nq \u00d7 4q, with only 4 non-zero elements in each row. For the GLM, we varied the sparsity parameter\n\u03b71 from 0 to 1 (yielding estimates of D that ranged from a dense matrix to entirely 0), and com-\nputed prediction performance at each prior setting. After exploratory runs, the parameters of the\nsmoothness prior were set to \u03b72 = 0.1 and\n\n\u03b73 = 20ms.\n\n\u221a\n\n3 Results\n\n3.1 Goodness-of-\ufb01t of dynamical system models and GLMs\n\nWe \ufb01rst compared the goodness-of-\ufb01t of PLDS with p = 5 latent dimensions against those of GLMs.\nFor all choices of the regularization parameter \u03b71 tested, we found that the prediction performance of\n\n5\n\n\fA)\n\nC)\n\nB)\n\nD)\n\nFigure 1: Quantifying goodness-of-\ufb01t. A) Prediction performance (variance minus mean-squared\nerror on test-set) of various coupled GLMs (10 ms history; 2 variants with 100 ms history; 150 ms\nhistory) plotted against sparsity in the \ufb01lter matrix D generated by different choices of \u03b71. For all\n\u03b71, GLM prediction was poorer than that of PLDS with p = 5. Error bars on PLDS-performance\nare standard errors of mean across trials. B) As A, measuring performance by area under the ROC-\ncurve (AUC). C) Prediction performance of different latent variable models (GPFA, and LDSs with\nGaussian, Poisson or history-dependent Poisson noise) on the test-set. Black dots indicate dimen-\nsionalities where PLDS with 100ms history is signi\ufb01cantly better than GLDS (p < 0.05, pairwise\ncomparisons of trials). PLDS outperforms alternatives, and performance plateaus at small latent\ndimensionalities. D) As C, but using AUC to quantify prediction performance. The ordering of the\nmethods (at the optimal dimensionality) is similar, but there is no advantage of PLDS for higher\ndimensional models.\n\nGLMs was inferior to that of PLDS (Fig. 1A). This was true for GLMs with history terms of length\n10ms, 100ms or 150ms (with 1, 4 or 5 basis functions each, which were equivalent to the history\nfunctions used for the spiking-history in the dynamical system model, with an additional 80 ms\ntime-constant exponential as the 5th basis function). To ensure that this difference in performance\nis not due to the GLM over-\ufb01tting the terms bt (which have q \u00d7 T parameters for the GLM, but only\np \u00d7 T parameters for PLDS), we \ufb01tted both GLMs and PLDS without those \ufb01lters. In this case, the\nprediction performance of both models decreased slightly, but the latent variable models still had\nsubstantially better prediction performance.\nOur performance metric based on the mean-squared error is sensitive both to the prediction of which\nbins contain spikes, as well as to how many they contain. To quantify the accuracy with which our\nmodels predicted only the absence or presence of spikes, we calculated the area under the curve\n(AUC) of the receiver operating characteristic [7]. As can be seen in Fig. 1 B the PLDS outperformed\nthe GLMs over all choices of the regularization parameter \u03b71.\nNext, we investigated a more realistic spiking noise model would further improves the performance\nof the dynamical system model, and how this would depend on the latent dimensionality d. We\ntherefore compared our models (GPFA, GLDS, PLDS, PLDS with 100ms history) for different\nchoices of the latent dimensionality d. When quantifying prediction performance using the mean-\nsquared error, we found that for all four models, prediction performance on the test-set increased\nstrongly with dimensionality for small dimensions, but plateaued at about 8 to 10 dimensions (see\nFig. 1C). Thus, of the models considered here, a low-dimensional latent variable provides the best\n\ufb01t to the data.\n\n6\n\n02040608010000.511.522.53x 10\u22123Percentage of zero\u2212entries in coupling matrixVar\u2212MSE  GLM 10msGLM 100msGLM2 100msGLM 150msPLDS dim=50204060801000.580.60.620.640.660.68Percentage of zero\u2212entries in coupling matrixAUC  GLM 10msGLM 100msGLM2 100msGLM 150msPLDS dim=551015201.522.533.5x 10\u22123Dimensionality of latent spaceVar\u2212MSE  GPFAGLDSPLDSPLDS 100ms51015200.640.650.660.670.68Dimensionality of latent spaceAUC  GPFAGLDSPLDSPLDS 100ms\fA)\n\nB)\n\nFigure 2: Temporal structure of cross-correlations. A) Average temporal cross-correlation in four\ngroups of neurons (color-coded from most to least correlated), and comparison with correlations\ncaptured by the dynamical system models with Gaussian, Poisson or history-dependent Poisson\nnoise. All three model correlations agree well with the data. B) Comparison of GLMs with differing\nhistory-dependence with cortical recordings; the correlations of the models differ markedly from\nthose of the data, and do not have a peak at zero time-lag.\n\nWe also found that models with the more realistic spiking noise model (PLDS, and PLDS 100ms)\nhad a small, but consistent performance bene\ufb01t over the computationally more ef\ufb01cient Gaussian\nmodels (GLDS, GPFA). However, for the dataset and comparison considered here (which was based\non predicting the mean activity averaged over all possible spiking histories), we only found a small\nadvantage of also adding single-neuron dynamics (i.e. the spike-history \ufb01lters in D) to the spiking\nnoise model. If we compared the models using their ability to predict population activity on the\nnext time-step from the observed population history, single-neuron \ufb01lters did have an effect. In this\nprediction task, PLDS with history \ufb01lters performed best, in particular better than GLMs.\nWhen using AUC rather than mean-squared-error to quantify prediction performance, we found\nsimilar results: Low-dimensional models showed best performance, spiking models slightly outper-\nformed Gaussian ones, and adding single-neuron dynamics yielded only a small bene\ufb01t. In addition,\nwhen using AUC, the performance bene\ufb01t of PLDS over GLDS was smaller, and was signi\ufb01cant\nonly at those state-dimensionalities for which overall prediction performance was best. Finally, both\nGPFA and GLDS at p = 5 outperformed all GLMs, both for using AUC and mean-squared-error.\nThus, all four of our latent variable models provided superior \ufb01ts to the dataset than GLMs.\n\n3.2 Reproducing the correlations of cortical population activity\n\nIn the introduction, we argued that dynamical system models would be more appropriate for captur-\ning the typical temporal structure of cross-neural correlations in cortical multi-cell recordings. We\nexplicitly tested this claim in our cortical recordings. First, we subtracted the time-varying mean\n\ufb01ring rate (PSTH) of each neuron to eliminate correlations induced by similarity in mean \ufb01ring\nrates. Then, we calculated time-lagged cross-correlations for each pair of neurons, using 10ms bins.\nFor display purposes, we divided neurons into 4 groups (color coded in Fig. 2) according to their\ntotal correlation (using summed correlation coef\ufb01cients with all other neurons), and calculated the\naverage pairwise correlation in each group. Fig. 2A shows the resulting average time-lagged correla-\ntions, and shows that both dynamical system models accurately capture this aspect of the correlation\nstructure of the data. In contrast Fig. 2B shows that the temporal correlations of the GLM differ\nmarkedly from the real data1. As mentioned before, this GLM is also \ufb01t at 10ms resolution, leaving\nopen the possibility that \ufb01tting it at a \ufb01ner temporal resolution would yield samples which more\nclosely re\ufb02ect the recorded correlations.\n\n3.3 Reproducing the distribution of spike-counts across the population\n\nIn the above, we showed that the PLDS model outperforms both Gaussian models and GLMs with\nrespect to our performance-metric, and that samples from both dynamical systems accurately cap-\nture the temporal correlation structure of the data. Finally, we looked at an aggregate measure of\n\n1We used \u03b71 = 0, i.e. no regularization for this \ufb01gure, results with \u03b71 optimized for prediction performance\n\nvastly underestimate correlations in the data.\n\n7\n\n\u2212100\u2212500501000.010.020.030.040.05Time lag (ms)Correlation  GLDSPLDSPLDS 100msrecorded data\u2212100\u2212500501000.010.020.030.040.05Time lag (ms)Correlation  GLM 10msGLM 100msGLM 150msrecorded data\fFigure 3: Modeling population\nspike counts. Distribution of\nthe population spike counts, and\ncomparison with distributions\nfrom PLDS, GLDS and two ver-\nsions of the GLM with 150ms\nhistory dependence (GLM with\nno regularization, GLM2 with\noptimal sparsity).\n\npopulation activity, namely the distribution of population spike counts, i.e. the distribution of the\ntotal number of spikes across the population per time bin. This distribution is in\ufb02uenced both by the\nsingle-neuron spike-count distributions and second- and higher-order correlations across neurons.\nFig. 3 shows that the PLDS model accurately reproduces the spike-count distribution in the data,\nwhereas the other two models do not. The GLDS model underestimates the frequency of high spike\ncounts, despite accurately matching both the mean and the variance of the distribution. For the GLM\n(using 150ms history, and either no regularization or optimal regularization), the frequency of rare\nevents is either over- or under-estimated. This could be further indication that the GLM does not\nfully capture the fact that variability is shared across many cells in the population.\n\n4 Discussion\n\nWe explored a statistical model of cortical population recordings based on a latent dynamical system\nwith count-process observations. We argued that such a model provides a more natural modeling\nchoice than coupled spike-response GLMs for cortical array-recordings; and indeed, this model did\n\ufb01t motor-cortical multi-unit recording better, and more faithfully reproduced the temporal structure\nof cross-neural correlations. GLMs have many attractive properties, and given the \ufb02exibility of\nthe model class, it is impossible to rule out that some coupled GLM with \ufb01ner temporal resolution,\npossibly nonlinear history dependencies and cleverly chosen regularization would yield better cross-\nvalidation performance. We here argued that latent variable models yield a more appropriate model\nof cross-neural correlations with zero-lag peaks: In GLMs, one has to use a \ufb01ne discretization of\nthe time-axis (which can be computationally intensive) or work in continuous time to achieve this.\nThus, they might constitute good point-process models at \ufb01ne time-scales, but arguably not the right\ncount-process model to model neural recordings at coarser time-scales.\nWe also showed that a model with count-process observations yields better \ufb01ts to our data than\nones with a Gaussian noise model, and that it has a more realistic distribution of population spike\ncounts. Given that spiking data is discrete and therefore non-Gaussian, this might not seem surpris-\ning. However, it is important to note that the Gaussian model has free parameters for the single-\nneuron variability, whereas the conditional variance of the Poisson model is constrained to equal the\nmean. For data in which this assumption is invalid, use of other count models, such as a negative\nbinomial distribution, might be more appropriate. In addition, \ufb01tting the PLDS model requires sim-\nplifying approximations, and these approximations could offset any gain in prediction performance.\nAs measured by our co-smoothing metrics, the performance advantage of our count-process over the\nGaussian noise model was small, and the question of whether this advantage would justify the con-\nsiderable additional computational cost of the count-process model will depend on the application at\nhand. In addition, any comparison of statistical models depends on the data used, as different meth-\nods are appropriate for datasets with different properties. For the recordings we considered here,\na dynamical system model with count-process observations worked best, but there will be datasets\nfor which either GLMs, or GLDS or GPFA provide the most appropriate model. Finally, the choice\nof the most appropriate model depends on the analysis or prediction question of interest. While\nwe used a co-smoothing metric to quantify model performance, different models might be more\nsuitable for decoding reaching movements from population activity [11], or inferring the underlying\nanatomical connectivity from extracellular recordings.\n\n8\n\n01020304000.020.040.060.080.10.120.14Number of spikes in 10ms binFrequency  GLMGLM 2GLDSPLDSreal data\fAcknowledgements\n\nWe acknowledge support from the Gatsby Charitable Foundation, an EU Marie Curie Fellowship to JHM,\nEPSRC EP/H019472/1 to JPC, the Defense Advanced Research Projects Agency (DARPA) through \u201cReor-\nganization and Plasticity to Accelerate Injury Recovery (REPAIR; N66001-10-C-2010)\u201d, NIH CRCNS R01-\nNS054283 to KVS and MS as well as NIH Pioneer 1DP1OD006409 to KVS.\n\nReferences\n[1] J. W. Pillow, J. Shlens, L. Paninski, A. Sher, A. M. Litke, E. J. Chichilnisky, and E. P. Simoncelli. Spatio-\ntemporal correlations and visual signalling in a complete neuronal population. Nature, 454(7207):995\u2013\n999, 2008.\n\n[2] G. Santhanam, B. M. Yu, V. Gilja, S. I. Ryu, A. Afshar, M. Sahani, and K. V. Shenoy. Factor-analysis\n\nmethods for higher-performance neural prostheses. J Neurophysiol, 102(2):1315\u20131330, 2009.\n\n[3] E.S. Chornoboy, L.P. Schramm, and A.F. Karr. Maximum likelihood identi\ufb01cation of neural point process\n\nsystems. Biological Cybernetics, 59(4):265\u2013275, 1988.\n\n[4] P. McCulloch and J. Nelder. Generalized linear models. Chapman and Hall, London, 1989.\n[5] L. Paninski. Maximum likelihood estimation of cascade point-process neural encoding models. Network,\n\n15(4):243\u2013262, 2004.\n\n[6] S.P. Boyd and L. Vandenberghe. Convex optimization. Cambridge Univ Press, 2004.\n[7] W. Truccolo, L. R. Hochberg, and J. P. Donoghue. Collective dynamics in human and monkey sensori-\n\nmotor cortex: predicting single neuron spikes. Nat Neurosci, 13(1):105\u2013111, 2010.\n\n[8] B. M. Yu, J. P. Cunningham, G. Santhanam, S. I. Ryu, K. V. Shenoy, and M. Sahani. Gaussian-process\nfactor analysis for low-dimensional single-trial analysis of neural population activity. J Neurophysiol,\n102(1):614\u2013635, 2009.\n\n[9] S. Roweis and Z. Ghahramani. A unifying review of linear gaussian models. Neural Comput, 11(2):305\u2013\n\n345, 1999 Feb 15.\n\nComput, 15(5):965\u201391, 2003.\n\n[10] A. C. Smith and E. N. Brown. Estimating a state-space model from point process observations. Neural\n\n[11] V. Lawhern, W. Wu, N. Hatsopoulos, and L. Paninski. Population decoding of motor cortical activity\n\nusing a generalized linear model with hidden states. J Neurosci Methods, 189(2):267\u2013280, 2010.\n\n[12] J.E. Kulkarni and L. Paninski. Common-input models for multiple neural spike-train data. Network:\n\nComputation in Neural Systems, 18(4):375\u2013407, 2007.\n\n[13] M. Vidne, Y. Ahmadian, J. Shlens, J.W. Pillow, J Kulkarni, E. J. Chichilnisky, E. P. Simoncelli, and\nL Paninski. A common-input model of a complete network of ganglion cells in the primate retina. In\nComputational and Systems Neuroscience, 2010.\n\n[14] M. M. Churchland, B. M. Yu, M. Sahani, and K. V. Shenoy. Techniques for extracting single-trial activity\n\npatterns from large-scale neural recordings. Current Opinion in Neurobiology, 17(5):609\u2013618, 2007.\n\n[15] D. Y. Tso, C. D. Gilbert, and T. N. Wiesel. Relationships between horizontal interactions and functional\narchitecture in cat striate cortex revealed by cross-correlation analysis. J Neurosci, 6(4):1160\u20131170, 1986.\n[16] A. Jackson, V. J. Gee, S. N. Baker, and R. N. Lemon. Synchrony between neurons with similar muscle\n\n\ufb01elds in monkey motor cortex. Neuron, 38(1):115\u2013125, 2003.\n\n[17] W. Wu, Y. Gao, E. Bienenstock, J.P. Donoghue, and M.J. Black. Bayesian population decoding of motor\n\ncortical activity using a kalman \ufb01lter. Neural Comput, 18(1):80\u2013118, 2006.\n\n[18] B. Yu, A. Afshar, G. Santhanam, S.I. Ryu, K. Shenoy, and M. Sahani. Extracting dynamical structure\nembedded in neural activity. In Advances in Neural Information Processing Systems, volume 18, pages\n1545\u20131552. MIT Press, Cambridge, 2006.\n\n[19] J.P. Cunningham, B.M. Yu, K.V. Shenoy, and M. Sahani. Inferring neural \ufb01ring rates from spike trains\n\nusing gaussian processes. Advances in neural information processing systems, 20:329\u2013336, 2008.\n\n[20] U. T. Eden, L. M. Frank, R. Barbieri, V. Solo, and E. N. Brown. Dynamic analysis of neural encoding by\n\npoint process adaptive \ufb01ltering. Neural Comput, 16(5):971\u201398, 2004.\n\n[21] B. Yu, J. Cunningham, K. Shenoy, and M. Sahani. Neural decoding of movements: From linear to\n\nnonlinear trajectory models. In Neural Information Processing, pages 586\u2013595. Springer, 2008.\n\n[22] L. Paninski, Y. Ahmadian, D. G. Ferreira, S. Koyama, K. Rahnama Rad, M. Vidne, J. Vogelstein, and\n\nW. Wu. A new look at state-space models for neural data. J Comput Neurosci, 29(1-2):107\u2013126, 2010.\n\n[23] Y. Ahmadian, J. W. Pillow, and L. Paninski. Ef\ufb01cient markov chain monte carlo methods for decoding\n\nneural spike trains. Neural Comput, 23(1):46\u201396, 2011.\n\n[24] G. Andrew and J. Gao. Scalable training of l 1-regularized log-linear models. In Proceedings of the 24th\n\ninternational conference on Machine learning, pages 33\u201340. ACM, 2007.\n\n[25] E. Schneidman, M. J. 2nd Berry, R. Segev, and W. Bialek. Weak pairwise correlations imply strongly\n\ncorrelated network states in a neural population. Nature, 440(7087):1007\u201312, 2006.\n[26] T.D. Wickens. Elementary Signal Detection Theory. Oxford University Press, 2002.\n\n9\n\n\f", "award": [], "sourceid": 781, "authors": [{"given_name": "Jakob", "family_name": "Macke", "institution": null}, {"given_name": "Lars", "family_name": "Buesing", "institution": null}, {"given_name": "John", "family_name": "Cunningham", "institution": null}, {"given_name": "Byron", "family_name": "Yu", "institution": null}, {"given_name": "Krishna", "family_name": "Shenoy", "institution": null}, {"given_name": "Maneesh", "family_name": "Sahani", "institution": null}]}