{"title": "Bayesian latent structure discovery from multi-neuron recordings", "book": "Advances in Neural Information Processing Systems", "page_first": 2002, "page_last": 2010, "abstract": "Neural circuits contain heterogeneous groups of neurons that differ in type, location, connectivity, and basic response properties. However, traditional methods for dimensionality reduction and clustering are ill-suited to recovering the structure underlying the organization of neural circuits. In particular, they do not take advantage of the rich temporal dependencies in multi-neuron recordings and fail to account for the noise in neural spike trains. Here we describe new tools for inferring latent structure from simultaneously recorded spike train data using a hierarchical extension of a multi-neuron point process model commonly known as the generalized linear model (GLM). Our approach combines the GLM with flexible graph-theoretic priors governing the relationship between latent features and neural connectivity patterns. Fully Bayesian inference via P\u00f3lya-gamma augmentation of the resulting model allows us to classify neurons and infer latent dimensions of circuit organization from correlated spike trains. We demonstrate the effectiveness of our method with applications to synthetic data and multi-neuron recordings in primate retina, revealing latent patterns of neural types and locations from spike trains alone.", "full_text": "Bayesian latent structure discovery from\n\nmulti-neuron recordings\n\nScott W. Linderman\nColumbia University\n\nswl2133@columbia.edu\n\nRyan P. Adams\n\nHarvard University and Twitter\n\nrpa@seas.harvard.edu\n\nJonathan W. Pillow\nPrinceton University\n\npillow@princeton.edu\n\nAbstract\n\nNeural circuits contain heterogeneous groups of neurons that differ in type, location,\nconnectivity, and basic response properties. However, traditional methods for\ndimensionality reduction and clustering are ill-suited to recovering the structure\nunderlying the organization of neural circuits. In particular, they do not take\nadvantage of the rich temporal dependencies in multi-neuron recordings and fail\nto account for the noise in neural spike trains. Here we describe new tools for\ninferring latent structure from simultaneously recorded spike train data using a\nhierarchical extension of a multi-neuron point process model commonly known as\nthe generalized linear model (GLM). Our approach combines the GLM with \ufb02exible\ngraph-theoretic priors governing the relationship between latent features and neural\nconnectivity patterns. Fully Bayesian inference via P\u00f3lya-gamma augmentation\nof the resulting model allows us to classify neurons and infer latent dimensions of\ncircuit organization from correlated spike trains. We demonstrate the effectiveness\nof our method with applications to synthetic data and multi-neuron recordings in\nprimate retina, revealing latent patterns of neural types and locations from spike\ntrains alone.\n\n1\n\nIntroduction\n\nLarge-scale recording technologies are revolutionizing the \ufb01eld of neuroscience [e.g., 1, 5, 15]. These\nadvances present an unprecedented opportunity to probe the underpinnings of neural computation,\nbut they also pose an extraordinary statistical and computational challenge: how do we make sense\nof these complex recordings? To address this challenge, we need methods that not only capture\nvariability in neural activity and make accurate predictions, but also expose meaningful structure\nthat may lead to novel hypotheses and interpretations of the circuits under study. In short, we need\nexploratory methods that yield interpretable representations of large-scale neural data.\nFor example, consider a population of distinct retinal ganglion cells (RGCs). These cells only respond\nto light within their small receptive \ufb01eld. Moreover, decades of painstaking work have revealed a\nplethora of RGC types [16]. Thus, it is natural to characterize these cells in terms of their type and\nthe location of their receptive \ufb01eld center. Rather than manually searching for such a representation\nby probing with different visual stimuli, here we develop a method to automatically discover this\nstructure from correlated patterns of neural activity.\nOur approach combines latent variable network models [6, 10] with generalized linear models of\nneural spike trains [11, 19, 13, 20] in a hierarchical Bayesian framework. The network serves as a\nbridge, connecting interpretable latent features of interest to the temporal dynamics of neural spike\ntrains. Unlike many previous studies [e.g., 2, 3, 17], our goal here is not necessarily to recover true\nsynaptic connectivity, nor is our primary emphasis on prediction. Instead, our aim is to explore\nand compare latent patterns of functional organization, integrating over possible networks. To do\nso, we develop an ef\ufb01cient Markov chain Monte Carlo (MCMC) inference algorithm by leveraging\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fFigure 1: Components of the generative model. (a) Neurons in\ufb02uence one another via a sparse weighted network\nof interactions. (b) The network parameterizes an autoregressive model with a time-varying activation. (c)\nSpike counts are randomly drawn from a discrete distribution with a logistic link function. Each spike induces\nan impulse response on the activation of downstream neurons. (d) Standard GLM analyses correspond to a\nfully-connected network with Gaussian or Laplace distributed weights, depending on the regularization. (e-g) In\nthis work, we consider structured models like the stochastic block model (SBM), in which neurons have discrete\nlatent types (e.g. square or circle), and the latent distance model, in which neurons have latent locations that\ndetermine their probability of connection, capturing intuitive and interpretable patterns of connectivity.\n\nP\u00f3lya-gamma augmentation to derive collapsed Gibbs updates for the network. We illustrate the\nrobustness and scalability of our algorithm with synthetic data examples, and we demonstrate the\nscienti\ufb01c potential of our approach with an application to retinal ganglion cell recordings, where we\nrecover the true underlying cell types and locations from spike trains alone, without reference to the\nstimulus.\n\n2 Probabilistic Model\n\nFigure 1 illustrates the components of our framework. We begin with a prior distribution on networks\nthat generates a set of weighted connections between neurons (Fig. 1a). A directed edge indicates a\nfunctional relationship between the spikes of one neuron and the activation of its downstream neighbor.\nEach spike induces a weighted impulse response on the activation of the downstream neuron (Fig. 1b).\nThe activation is converted into a nonnegative \ufb01ring rate from which spikes are stochastically sampled\n(Fig. 1c). These spikes then feed back into the subsequent activation, completing an autoregressive\nloop, the hallmark of the GLM [11, 19]. Models like these have provided valuable insight into\ncomplex population recordings [13]. We detail the three components of this model in the reverse\norder, working backward from the observed spike counts through the activation to the underlying\nnetwork.\n\n2.1 Logistic Spike Count Models\n\nGeneralized linear models assume a stochastic spike generation mechanism. Consider a matrix of\nspike counts, S \u2208 NT\u00d7N , for T time bins and N neurons. The expected number of spikes \ufb01red by\nthe n-th neuron in the t-th time bin, E[st,n], is modeled as a nonlinear function of the instantaneous\nactivation, \u03c8t,n, and a static, neuron-speci\ufb01c parameter, \u03bdn. Table 1 enumerates the three spike count\nmodels considered in this paper, all of which use the logistic function, \u03c3(\u03c8) = e\u03c8(1 + e\u03c8)\u22121, to\nrectify the activation. The Bernoulli distribution is appropriate for binary spike counts, whereas the\n\n2\n\ncell 1cell 2cell 3cell Ncell 1cell 2cell 3cell NNetworkFiring RateSpike TrainA\u223cDenseW\u223cGaussianA\u223cBernoulliW\u223cDistanceA\u223cSBMW\u223cSBMA\u223cDistanceW\u223cSBM(a)(b)(c)(d)timetime(e)(f)(g)weight\fBin(\u03bd, \u03c3(\u03c8))\n\nStandard Form\n\nDistribution\nBern(\u03c3(\u03c8))\n\np(s| \u03c8, \u03bd)\n\n(cid:1) \u03c3(\u03c8)s \u03c3(\u2212\u03c8)\u03bd\u2212s\n(cid:0)\u03bd\n\u03c3(\u03c8)s \u03c3(\u2212\u03c8)1\u2212s\n(cid:0)\u03bd+s\u22121\n(cid:1) \u03c3(\u03c8)s \u03c3(\u2212\u03c8)\u03bd\n\n\u03c3(\u03c8) \u03c3(\u2212\u03c8)\n\u03bd\u03c3(\u03c8) \u03c3(\u2212\u03c8)\n\u03bde\u03c8/\u03c3(\u2212\u03c8)\nTable 1: Table of conditional spike count distributions, their parameterizations, and their properties.\n\n\u03bd\u03c3(\u03c8)\n\u03bde\u03c8\n\nNB(\u03bd, \u03c3(\u03c8))\n\n(e\u03c8)s\n1+e\u03c8\n\n(cid:1) (e\u03c8)s\n(cid:0)\u03bd\n(cid:0)\u03bd+s\u22121\n(cid:1)\n\ns\n\n(1+e\u03c8)\u03bd\n\n(e\u03c8)s\n\nVar(s)\n\ns\n\n(1+e\u03c8)\u03bd+s\n\ns\n\ns\n\nE[s]\n\u03c3(\u03c8)\n\nbinomial and negative binomial have support for s \u2208 [0, \u03bd] and s \u2208 [0,\u221e), respectively. Notably\nlacking from this list is the Poisson distribution, which is not directly amenable to the augmentation\nschemes we derive below; however, both the binomial and negative binomial distributions converge to\nthe Poisson under certain limits. Moreover, these distributions afford the added \ufb02exibility of modeling\nunder- and over-dispersed spike counts, a biologically signi\ufb01cant feature of neural spiking data [4].\nSpeci\ufb01cally, while the Poisson has unit dispersion (its mean is equal to its variance), the binomial\ndistribution is always under-dispersed, since its mean always exceeds its variance, and the negative\nbinomial is always over-dispersed, with variance greater than its mean.\nImportantly, all of these distributions can be written in a standard form, as shown in Table 1. We\nexploit this fact to develop an ef\ufb01cient Markov chain Monte Carlo (MCMC) inference algorithm\ndescribed in Section 3.\n\n2.2 Linear Activation Model\n\nThe instantaneous activation of neuron n at time t is modeled as a linear, autoregressive function of\npreceding spike counts of neighboring neurons,\n\nN(cid:88)\n\n\u2206tmax(cid:88)\n\nm=1\n\n\u2206t=1\n\nK(cid:88)\n\n\u03c8t,n (cid:44) bn +\n\nhm\u2192n[\u2206t] \u00b7 st\u2212\u2206t,m,\n\n(1)\n\nwhere bn is the baseline activation of neuron n and hm\u2192n : {1, . . . , \u2206tmax} \u2192 R is an impulse\nresponse function that models the in\ufb02uence spikes on neuron m have on the activation of neuron n\nat a delay of \u2206t. To model the impulse response, we use a spike-and-slab formulation [8],\n\nhm\u2192n[\u2206t] = am\u2192n\n\nw(k)\n\nm\u2192n \u03c6k[\u2206t].\n\n(2)\n\nk=1\n\nm\u2192n, ..., w(K)\n\nthe connections, weights, and \ufb01ltered spike trains and write the activation as,\n\nHere, am\u2192n \u2208 {0, 1} is a binary variable indicating the presence or absence of a connection\nfrom neuron m to neuron n, the weight wm\u2192n = [w(1)\nm\u2192n] denotes the strength of the\nconnection, and {\u03c6k}K\nk=1 is a collection of \ufb01xed basis functions. In this paper, we consider scalar\nweights (K = 1) and use an exponential basis function, \u03c61[\u2206t] = e\u2212\u2206t/\u03c4 , with time constant\nthe spike train and the basis function to obtain(cid:98)s(k)\nof \u03c4 = 15ms. Since the basis function and the spike train are \ufb01xed, we precompute the convolution of\n\u2206t=1 \u03c6k[\u2206t] \u00b7 st\u2212\u2206t,m. Finally, we combine\nwhere an = [1, a1\u2192n1K, ..., aN\u2192n1K], wn = [bn, w1\u2192n, ..., wN\u2192n], and(cid:98)st = [1,(cid:98)s(1)\nt,1 , ...,(cid:98)s(K)\nt,N ].\nHere, (cid:12) denotes the Hadamard (elementwise) product and 1K is length-K vector of ones. Hence, all\nof these vectors are of size 1 + N K. The difference between our formulation and the standard GLM\nis that we have explicitly modeled the sparsity of the weights in am\u2192n. In typical formulations [e.g.,\n13], all connections are present and the weights are regularized with (cid:96)1 and (cid:96)2 penalties to promote\nsparsity. Instead, we consider structured approaches to modeling the sparsity and weights.\n\nt,m =(cid:80)\u2206tmax\n\u03c8t,n = (an (cid:12) wn)T(cid:98)st,\n\n(3)\n\n2.3 Random Network Models\n\nPatterns of functional interaction can provide great insight into the computations performed by neural\ncircuits. Indeed, many circuits are informally described in terms of \u201ctypes\u201d of neurons that perform\na particular role, or the \u201cfeatures\u201d that neurons encode. Random network models formalize these\n\n3\n\n\fName\n\nDense Model\n\nIndependent Model\n\nStochastic Block Model\nLatent Distance Model\n\n\u03c1um\u2192un\n\u03c3(\u2212||un \u2212 vm||2\n\n\u00b5vm\u2192vn\n2 + \u03b30) \u2212||vn \u2212 vm||2\n\n2 + \u00b50\n\nTable 2: Random network models for the binary adjacency matrix or the Gaussian weight matrix.\n\n\u03c1(um, un, \u03b8)\n\n\u00b5(vm, vn, \u03b8)\n\n\u03a3(vm, vn, \u03b8)\n\n1\n\u03c1\n\n\u00b5\n\u00b5\n\n\u03a3\n\u03a3\n\n\u03b72\n\n\u03a3vm\u2192vn\n\nintuitive descriptions. Types and features correspond to latent variables in a probabilistic model that\ngoverns how likely neurons are to connect and how strongly they in\ufb02uence each other.\nLet A = {{am\u2192n}} and W = {{wm\u2192n}} denote the binary adjacency matrix and the real-valued\narray of weights, respectively. Now suppose {un}N\nn=1 are sets of neuron-speci\ufb01c\nlatent variables that govern the distributions over A and W . Given these latent variables and global\nparameters \u03b8, the entries in A are conditionally independent Bernoulli random variables, and the\nentries in W are conditionally independent Gaussians. That is,\n\nn=1 and {vn}N\n\np(A, W |{un, vn}N\n\nn=1, \u03b8) =\n\nN(cid:89)\n\nN(cid:89)\n\nm=1\n\nn=1\n\nBern (am\u2192n | \u03c1(um, un, \u03b8))\n\n\u00d7 N (wm\u2192n | \u00b5(vm, vn, \u03b8), \u03a3(vm, vn, \u03b8)) ,\n\n(4)\n\nwhere \u03c1(\u00b7), \u00b5(\u00b7), and \u03a3(\u00b7) are functions that output a probability, a mean vector, and a covariance\nmatrix, respectively. We recover the standard GLM when \u03c1(\u00b7) \u2261 1, but here we can take advantage\nof structured priors like the stochastic block model (SBM) [9], in which each neuron has a discrete\ntype, and the latent distance model [6], in which each neuron has a latent location. Table 2 outlines\nthe various models considered in this paper.\nWe can mix and match these models as shown in Figure 1(d-g). For example, in Fig. 1g, the adjacency\nmatrix is distance-dependent and the weights are block structured. Thus, we have a \ufb02exible language\nfor expressing hypotheses about patterns of interaction. In fact, the simple models enumerated above\nare instances of a rich family of exchangeable networks known as Aldous-Hoover random graphs,\nwhich have been recently reviewed by Orbanz and Roy [10].\n\n3 Bayesian Inference\n\nGeneralized linear models are often \ufb01t via maximum a posteriori (MAP) estimation [11, 19, 13, 20].\nHowever, as we scale to larger populations of neurons, there will inevitably be structure in the\nposterior that is not re\ufb02ected with a point estimate. Technological advances are expanding the number\nof neurons that can be recorded simultaneously, but \u201chigh-throughput\u201d recording of many individuals\nis still a distant hope. Therefore we expect the complexities of our models to expand faster than the\navailable distinct data sets to \ufb01t them. In this situation, accurately capturing uncertainty is critical.\nMoreover, in the Bayesian framework, we also have a coherent way to perform model selection\nand evaluate hypotheses regarding complex underlying structure. Finally, after introducing a binary\nadjacency matrix and hierarchical network priors, the log posterior is no longer a concave function of\nmodel parameters, making direct optimization challenging (though see Soudry et al. [17] for recent\nadvances in tackling similar problems). These considerations motivate a fully Bayesian approach.\nComputation in rich Bayesian models is often challenging, but through thoughtful modeling decisions\nit is sometimes possible to \ufb01nd representations that lead to ef\ufb01cient inference. In this case, we have\ncarefully chosen the logistic models of the preceding section in order to make it possible to apply\nthe P\u00f3lya-gamma augmentation scheme [14]. The principal advantage of this approach is that, given\nthe P\u00f3lya-gamma auxiliary variables, the conditional distribution of the weights is Gaussian, and\nhence is amenable to ef\ufb01cient Gibbs sampling. Recently, Pillow and Scott [12] used this technique to\ndevelop inference algorithms for negative binomial factor analysis models of neural spike trains. We\nbuild on this work and show how this conditionally Gaussian structure can be exploited to derive\nef\ufb01cient, collapsed Gibbs updates.\n\n4\n\n\f3.1 Collapsed Gibbs updates for Gaussian observations\nSuppose the observations were actually Gaussian distributed, i.e. st,n \u223c N (\u03c8t,n, \u03bdn). The most\nchallenging aspect of inference is then sampling the posterior distribution over discrete connec-\ntions, A. There may be many posterior modes corresponding to different patterns of connectivity.\nMoreover, am\u2192n and wm\u2192n are often highly correlated, which leads to poor mixing of na\u00efve Gibbs\nsampling. Fortunately, when the observations are Gaussian, we may integrate over possible weights\nand sample the binary adjacency matrix from its collapsed conditional distribution.\nWe combine the conditionally independent Gaussian priors on {wm\u2192n} and bn into a joint Gaussian\ndistribution, wn |{vn}, \u03b8 \u223c N (wn | \u00b5n, \u03a3n), where \u03a3n is a block diagonal covariance matrix.\nSince \u03c8t,n is linear in wn (see Eq. 3), a Gaussian likelihood is conjugate with this Gaussian prior,\n\ngiven an and(cid:98)S = {(cid:98)st}T\nT(cid:89)\np(wn |(cid:98)S, an, \u00b5n, \u03a3n) \u221d N (wn | \u00b5n, \u03a3n)\nt=1N (st,n | (an (cid:12) wn)T(cid:98)st, \u03bdn) \u221d N (wn |(cid:101)\u00b5n,(cid:101)\u03a3n),\n(cid:16)(cid:98)S\n(cid:105)\u22121\n(cid:105)\n(cid:101)\u03a3n =\n(cid:101)\u00b5n = (cid:101)\u03a3n\nn I)(cid:98)S\n\nt=1. This yields the following closed-form conditional:\n\n\u03a3\u22121\nn \u00b5n +\n\n(cid:16)(cid:98)S\n\n\u03a3\u22121\nn +\n\nn I)s:,n\n\n(\u03bd\u22121\n\n(cid:17)\n\nT\n\n(\u03bd\u22121\n\n(cid:17)\n\n(cid:104)\n\n(cid:104)\n\nT\n\n(cid:12) (anaT\nn)\n\n,\n\n(cid:12) an\n\n.\n\nNow, consider the conditional distribution of an, integrating out the corresponding weights.\nThe prior distribution over an is a product of Bernoulli distributions with parame-\nm=1. The conditional distribution is proportional to the ratio of the prior\nters \u03c1n = {\u03c1(um, un, \u03b8)}N\nand posterior partition functions,\n\np(an |(cid:98)S, \u03c1n, \u00b5n, \u03a3n) =\n\n(cid:90)\n\np(an, wn |(cid:98)S, \u03c1n, \u00b5n, \u03a3n) dwn\nn\u03a3\u22121\nn(cid:101)\u03a3\nn (cid:101)\u00b5n\n2(cid:101)\u00b5T\n\u2212 1\nn \u00b5n\n\u22121\n\u2212 1\n\n(cid:12)(cid:12)\u03a3n\n(cid:12)(cid:12)(cid:101)\u03a3n\n\n(cid:12)(cid:12)\u2212 1\n(cid:12)(cid:12)\u2212 1\n\n(cid:110)\n(cid:110)\n\n2 \u00b5T\n\n2 exp\n\n2 exp\n\n(cid:111)\n(cid:111) .\n\n= p(an | \u03c1n)\n\nThus, we perform a joint update of an and wn by collapsing out the weights to directly sample the\nbinary entries of an. We iterate over each entry, am\u2192n, and sample it from its conditional distribution\ngiven {am(cid:48)\u2192n}m(cid:48)(cid:54)=m. Having sampled an, we sample wn from its Gaussian conditional.\n3.2\n\nP\u00f3lya-gamma augmentation for discrete observations\n\nNow, let us turn to the non-conjugate case of discrete count observations. The P\u00f3lya-gamma aug-\nmentation [14] introduces auxiliary variables, \u03c9t,n, conditioned upon which the discrete likelihood\nappears Gaussian and our collapsed Gibbs updates apply. The integral identity underlying this scheme\nis\n\nc\n\n(e\u03c8)a\n\n(1 + e\u03c8)b = c 2\u2212be\u03ba\u03c8\n\ne\u2212\u03c9\u03c82/2 pPG(\u03c9 | b, 0) d\u03c9,\n\n(5)\n\nwhere \u03ba = a \u2212 b/2 and p(\u03c9 | b, 0) is the density of the P\u00f3lya-gamma distribution PG(b, 0), which\ndoes not depend on \u03c8. Notice that the discrete likelihoods in Table 1 can all be rewritten like\nthe left-hand side of (5), for some a, b, and c that are functions of s and \u03bd. Using (5) along with\npriors p(\u03c8) and p(\u03bd), we write the joint density of (\u03c8, s, \u03bd) as\n\np(s, \u03bd, \u03c8) =\n\np(\u03bd) p(\u03c8) c(s, \u03bd) 2\u2212b(s,\u03bd)e\u03ba(s,\u03bd)\u03c8e\u2212\u03c9\u03c82/2 pPG(\u03c9 | b(s, \u03bd), 0) d\u03c9.\n\n(6)\n\nThe integrand of Eq. 6 de\ufb01nes a joint density on (s, \u03bd, \u03c8, \u03c9) which admits p(s, \u03bd, \u03c8) as a marginal\ndensity. Conditioned on the auxiliary variable, \u03c9, the likelihood as a function of \u03c8 is,\n\n(cid:90) \u221e\n\n0\n\np(s| \u03c8, \u03bd, \u03c9) \u221d e\u03ba(s,\u03bd)\u03c8e\u2212\u03c9\u03c82/2 \u221d N\n\n(cid:0)\u03c9\u22121\u03ba(s, \u03bd)| \u03c8, \u03c9\u22121(cid:1) .\n\nThus, after conditioning on s, \u03bd, and \u03c9, we effectively have a linear Gaussian likelihood for \u03c8.\nWe apply this augmentation scheme to the full model, introducing auxiliary variables, \u03c9t,n for each\nspike count, st,n. Given these variables, the conditional distribution of wn can be computed in closed\n\n(cid:90) \u221e\n\n0\n\n5\n\n\fFigure 2: Weighted adjacency matrices showing inferred networks and connection probabilities for synthetic\ndata. (a,d) True network. (b,e) Posterior mean using joint inference of network GLM. (c,f) MAP estimation.\n\nform, as before. Let \u03ban = [\u03ba(s1,n, \u03bdn), . . . , \u03ba(sT,n, \u03bdn)] and \u2126n = diag([\u03c91,n, . . . , \u03c9T,n]). Then\n\nwe have p(wn | sn,(cid:98)S, an, \u00b5n, \u03a3n, \u03c9n, \u03bdn) \u221d N (wn |(cid:101)\u00b5n,(cid:101)\u03a3n), where\n\n(cid:104)\n\n(cid:101)\u03a3n =\n\n\u03a3\u22121\nn +\n\n(cid:16)(cid:98)S\n\n\u2126n(cid:98)S\n\nT\n\n(cid:17)\n\n(cid:105)\u22121\n\n,\n\n(cid:104)\n\n(cid:101)\u00b5n = (cid:101)\u03a3n\n\n(cid:12) (anaT\nn)\n\n(cid:16)(cid:98)S\n\n(cid:17)\n\n(cid:105)\n\n\u03a3\u22121\nn \u00b5n +\n\nT\n\n\u03ban\n\n(cid:12) an\n\n.\n\nHaving introduced auxiliary variables, we must now derive Markov transitions to update them as\nwell. Fortunately, the P\u00f3lya-gamma distribution is designed such that the conditional distribution of\nthe auxiliary variables is simply a \u201ctilted\u201d P\u00f3lya-gamma distribution,\n\np(\u03c9t,n | st,n, \u03bdn, \u03c8t,n) = pPG(\u03c9t,n | b(st,n, \u03bdn), \u03c8t,n).\n\nThese auxiliary variables are conditionally independent given the activation and hence can be\nsampled in parallel. Moreover, ef\ufb01cient algorithms are available to generate P\u00f3lya-gamma random\nvariates [21]. Our Gibbs updates for the remaining parameters and latent variables (\u03bdn, un, vn, and \u03b8)\nare described in the supplementary material. A Python implementation of our inference algorithm is\navailable at https://github.com/slinderman/pyglm.\n\n4 Synthetic Data Experiments\n\nThe need for network models is most pressing in recordings of large populations where the network\nis dif\ufb01cult to estimate and even harder to interpret. To assess the robustness and scalability of our\nframework, we apply our methods to simulated data with known ground truth. We simulate a one\nminute recording (1ms time bins) from a population of 200 neurons with discrete latent types that\ngovern the connection strength via a stochastic block model and continuous latent locations that\ngovern connection probability via a latent distance model. The spikes are generated from a Bernoulli\nobservation model.\nFirst, we show that our approach of jointly inferring the network and its latent variables can provide\ndramatic improvements over alternative approaches. For comparison, consider the two-step procedure\nof Stevenson et al. [18] in which the network is \ufb01t with an (cid:96)1-regularized GLM and then a probabilistic\nnetwork model is \ufb01t to the GLM connection weights. The advantage of this strategy is that the\nexpensive GLM \ufb01tting is only performed once. However, when the data is limited, both the network\nand the latent variables are uncertain. Our Bayesian approach \ufb01nds a very accurate network (Fig. 2b)\n\n6\n\nMAPW, MAPW(d)(e)(f)(a)(b)(c)TrueTrueAMCMC\fFigure 3: Scalability of our inference algorithm as a function of: (a) the number of time bins, T ; (b) the number\nof neurons, N; and (c) the average sparsity of the network, \u03c1. Wall-clock time is divided into time spent sampling\nauxiliary variables (\u201cObs.\u201d) and time spent sampling the network (\u201cNet.\u201d).\n\nby jointly sampling networks and latent variables. In contrast, the standard GLM does not account\nfor latent structure and \ufb01nds strong connections as well as spuriously correlated neurons (Fig. 2c).\nMoreover, our fully Bayesian approach \ufb01nds a set of latent locations that mimics the true locations\nand therefore accurately estimates connection probability (Fig. 2e). In contrast, subsequently \ufb01tting a\nlatent distance model to the adjacency matrix of a thresholded GLM network \ufb01nds an embedding\nthat has no resemblance to the true locations, which is re\ufb02ected in its poor estimate of connection\nprobability (Fig. 2f).\nNext, we address the scalability of our MCMC algorithm. Three major parameters govern the\ncomplexity of inference: the number of time bins, T ; the number of neurons, N; and the level of\nsparsity, \u03c1. The following experiments were run on a quad-core Intel i5 with 6GB of RAM. As shown\nin Fig. 3a, the wall clock time per iteration scales linearly with T since we must resample N T auxiliary\nvariables. We scale at least quadratically with N due to the network, as shown in Fig. 3b. However,\nthe total cost could actually be worse than quadratic since the cost of updating each connection could\ndepend on N. Fortunately, the complexity of our collapsed Gibbs sampling algorithm only depends\non the number of incident connections, d, or equivalently, the sparsity \u03c1 = d/N. Speci\ufb01cally, we\nmust solve a linear system of size d, which incurs a cubic cost, as seen in Fig. 3c.\n\n5 Retinal Ganglion Cells\n\nFinally, we demonstrate the ef\ufb01cacy of this approach with an application to spike trains simultaneously\nrecorded from a population of 27 retinal ganglion cells (RGCs), which have previously been studied\nby Pillow et al. [13]. Retinal ganglion cells respond to light shown upon their receptive \ufb01eld. Thus, it\nis natural to characterize these cells by the location of their receptive \ufb01eld center. Moreover, retinal\nganglion cells come in a variety of types [16]. This population is comprised of two types of cells, on\nand off cells, which are characterized by their response to visual stimuli. On cells increase their \ufb01ring\nwhen light is shone upon their receptive \ufb01eld; off cells decrease their \ufb01ring rate in response to light in\ntheir receptive \ufb01eld. In this case, the population is driven by a binary white noise stimulus. Given\nthe stimulus, the cell locations and types are readily inferred. Here, we show how these intuitive\nrepresentations can be discovered in a purely unsupervised manner given one minute of spiking data\nalone and no knowledge of the stimulus.\nFigure 4 illustrates the results of our analysis. Since the data are binned at 1ms resolution, we have\nat most one spike per bin and we use a Bernoulli observation model. We \ufb01t the 12 network models\nof Table 2 (4 adjacency models and 3 weight models), and we \ufb01nd that, in terms of predictive log\nlikelihood of held-out neurons, a latent distance model of the adjacency matrix and SBM of the\nweight matrix performs best (Fig. 4a). See the supplementary material for a detailed description of\nthis comparison. Looking into the latent locations underlying the adjacency matrix our network GLM\n(NGLM), we \ufb01nd that the inferred distances between cells are highly correlated with the distances\nbetween the true locations. For comparison, we also \ufb01t a 2D Bernoulli linear dynamical system\n(LDS) \u2014 the Bernoulli equivalent of the Poisson LDS [7] \u2014 and we take rows of the N\u00d72 emission\nmatrix as locations. In contrast to our network GLM, the distances between LDS locations are nearly\nuncorrelated with the true distances (Fig. 4b) since the LDS does not capture the fact that distance\nonly affects the probability of connection, not the weight. Not only are our distances accurate, the\ninferred locations are nearly identical to the true locations, up to af\ufb01ne transformation. In Fig. 4c,\nsemitransparent markers show the inferred on cell locations, which have been rotated and scaled to\n\n7\n\n(a)(b)(c)\fFigure 4: Using our framework, retinal ganglion cell types and locations can be inferred from spike trains alone.\n(a) Model comparison. (b) True and inferred distances between cells. (c) True and inferred cell locations. (d-f)\nInferred network, connection probability, and mean weight, respectively. See main text for further details.\n\nbest align with the true locations shown by the outlined marks. Based solely on patterns of correlated\nspiking, we have recovered the receptive \ufb01eld arrangements.\nFig. 4d shows the inferred network, A (cid:12) W , under a latent distance model of connection probability\nand a stochastic block model for connection weight. The underlying connection probabilities from\nthe distance model are shown in Fig. 4e. Finally, Fig. 4f shows that we have discovered not only\nthe cell locations, but also their latent types. With an SBM, the mean weight is a function of latent\ntype, and under the posterior, the neurons are clearly clustered into the two true types that exhibit the\nexpected within-class excitation and between-class inhibition.\n\n6 Conclusion\nOur results with both synthetic and real neural data provide compelling evidence that our methods can\n\ufb01nd meaningful structure underlying neural spike trains. Given the extensive work on characterizing\nretinal ganglion cell responses, we have considerable evidence that the representation we learn from\nspike trains alone is indeed the optimal way to summarize this population of cells. This lends us\ncon\ufb01dence that we may trust the representations learned from spike trains recorded from more\nenigmatic brain areas as well. While we have omitted stimulus from our models and only used\nit for con\ufb01rming types and locations, in practice we could incorporate it into our model and even\ncapture type- and location-dependent patterns of stimulus dependence with our hierarchical approach.\nLikewise, the network GLM could be combined with the PLDS as in Vidne et al. [20] to capture\nsources of low dimensional, shared variability.\nLatent functional networks underlying spike trains can provide unique insight into the structure of\nneural populations. Looking forward, methods that extract interpretable representations from complex\nneural data, like those developed here, will be key to capitalizing on the dramatic advances in neural\nrecording technology. We have shown that networks provide a natural bridge to connect neural types\nand features to spike trains, and demonstrated promising results on both real and synthetic data.\nAcknowledgments. We thank E. J. Chichilnisky, A. M. Litke, A. Sher and J. Shlens for retinal data. SWL is\nsupported by the Simons Foundation SCGB-418011. RPA is supported by NSF IIS-1421780 and the Alfred P.\nSloan Foundation. JWP was supported by grants from the McKnight Foundation, Simons Collaboration on the\nGlobal Brain (SCGB AWD1004351), NSF CAREER Award (IIS-1150186), and NIMH grant MH099611.\n\n8\n\nWeights(a)(b)(c)(d)(e)(f)On Cell LocationsInferred distance [a.u.]Pairwise DistancesTrue distance [a.u.]LDSNGLM\fReferences\n[1] M. B. Ahrens, M. B. Orger, D. N. Robson, J. M. Li, and P. J. Keller. Whole-brain functional imaging at\n\ncellular resolution using light-sheet microscopy. Nature methods, 10(5):413\u2013420, 2013.\n\n[2] D. R. Brillinger, H. L. Bryant Jr, and J. P. Segundo. Identi\ufb01cation of synaptic interactions. Biological\n\nCybernetics, 22(4):213\u2013228, 1976.\n\n[3] F. Gerhard, T. Kispersky, G. J. Gutierrez, E. Marder, M. Kramer, and U. Eden. Successful reconstruction of\na physiological circuit with known connectivity from spiking activity alone. PLoS Computational Biology,\n9(7):e1003138, 2013.\n\n[4] R. L. Goris, J. A. Movshon, and E. P. Simoncelli. Partitioning neuronal variability. Nature Neuroscience,\n\n17(6):858\u2013865, 2014.\n\n[5] B. F. Grewe, D. Langer, H. Kasper, B. M. Kampa, and F. Helmchen. High-speed in vivo calcium imaging\nreveals neuronal network activity with near-millisecond precision. Nature methods, 7(5):399\u2013405, 2010.\n\n[6] P. D. Hoff. Modeling homophily and stochastic equivalence in symmetric relational data. Advances in\n\nNeural Information Processing Systems 20, 20:1\u20138, 2008.\n\n[7] J. H. Macke, L. Buesing, J. P. Cunningham, M. Y. Byron, K. V. Shenoy, and M. Sahani. Empirical models\nof spiking in neural populations. In Advances in neural information processing systems, pages 1350\u20131358,\n2011.\n\n[8] T. J. Mitchell and J. J. Beauchamp. Bayesian Variable Selection in Linear Regression. Journal of the\n\nAmerican Statistical Association, 83(404):1023\u2014-1032, 1988.\n\n[9] K. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the\n\nAmerican Statistical Association, 96(455):1077\u20131087, 2001.\n\n[10] P. Orbanz and D. M. Roy. Bayesian models of graphs, arrays and other exchangeable random structures.\n\nPattern Analysis and Machine Intelligence, IEEE Transactions on, 37(2):437\u2013461, 2015.\n\n[11] L. Paninski. Maximum likelihood estimation of cascade point-process neural encoding models. Network:\n\nComputation in Neural Systems, 15(4):243\u2013262, Jan. 2004.\n\n[12] J. W. Pillow and J. Scott. Fully bayesian inference for neural models with negative-binomial spiking. In\nF. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing\nSystems 25, pages 1898\u20131906. 2012.\n\n[13] J. W. Pillow, J. Shlens, L. Paninski, A. Sher, A. M. Litke, E. Chichilnisky, and E. P. Simoncelli. Spatio-\ntemporal correlations and visual signalling in a complete neuronal population. Nature, 454(7207):995\u2013999,\n2008.\n\n[14] N. G. Polson, J. G. Scott, and J. Windle. Bayesian inference for logistic models using P\u00f3lya\u2013gamma latent\n\nvariables. Journal of the American Statistical Association, 108(504):1339\u20131349, 2013.\n\n[15] R. Prevedel, Y.-G. Yoon, M. Hoffmann, N. Pak, G. Wetzstein, S. Kato, T. Schr\u00f6del, R. Raskar, M. Zimmer,\nE. S. Boyden, et al. Simultaneous whole-animal 3d imaging of neuronal activity using light-\ufb01eld microscopy.\nNature methods, 11(7):727\u2013730, 2014.\n\n[16] J. R. Sanes and R. H. Masland. The types of retinal ganglion cells: current status and implications for\n\nneuronal classi\ufb01cation. Annual review of neuroscience, 38:221\u2013246, 2015.\n\n[17] D. Soudry, S. Keshri, P. Stinson, M.-h. Oh, G. Iyengar, and L. Paninski. A shotgun sampling solution for\n\nthe common input problem in neural connectivity inference. arXiv preprint arXiv:1309.3724, 2013.\n\n[18] I. H. Stevenson, J. M. Rebesco, N. G. Hatsopoulos, Z. Haga, L. E. Miller, and K. P. K\u00f6rding. Bayesian\ninference of functional connectivity and network structure from spikes. Neural Systems and Rehabilitation\nEngineering, IEEE Transactions on, 17(3):203\u2013213, 2009.\n\n[19] W. Truccolo, U. T. Eden, M. R. Fellows, J. P. Donoghue, and E. N. Brown. A point process framework for\nrelating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. Journal\nof Neurophysiology, 93(2):1074\u20131089, 2005. doi: 10.1152/jn.00697.2004.\n\n[20] M. Vidne, Y. Ahmadian, J. Shlens, J. W. Pillow, J. Kulkarni, A. M. Litke, E. Chichilnisky, E. Simoncelli,\nand L. Paninski. Modeling the impact of common noise inputs on the network activity of retinal ganglion\ncells. Journal of computational neuroscience, 33(1):97\u2013121, 2012.\n\n[21] J. Windle, N. G. Polson, and J. G. Scott. Sampling P\u00f3lya-gamma random variates: alternate and approxi-\n\nmate techniques. arXiv preprint arXiv:1405.0506, 2014.\n\n9\n\n\f", "award": [], "sourceid": 1074, "authors": [{"given_name": "Scott", "family_name": "Linderman", "institution": "Columbia University"}, {"given_name": "Ryan", "family_name": "Adams", "institution": "Harvard and Twitter"}, {"given_name": "Jonathan", "family_name": "Pillow", "institution": "Princeton University"}]}