{"title": "Coding efficiency and detectability of rate fluctuations with non-Poisson neuronal firing", "book": "Advances in Neural Information Processing Systems", "page_first": 180, "page_last": 188, "abstract": "Statistical features of neuronal spike trains are known to be non-Poisson. Here, we investigate the extent to which the non-Poissonian feature affects the efficiency of transmitting information on fluctuating firing rates. For this purpose, we introduce the Kullbuck-Leibler (KL) divergence as a measure of the efficiency of information encoding, and assume that spike trains are generated by time-rescaled renewal processes. We show that the KL divergence determines the lower bound of the degree of rate fluctuations below which the temporal variation of the firing rates is undetectable from sparse data. We also show that the KL divergence, as well as the lower bound, depends not only on the variability of spikes in terms of the coefficient of variation, but also significantly on the higher-order moments of interspike interval (ISI) distributions. We examine three specific models that are commonly used for describing the stochastic nature of spikes (the gamma, inverse Gaussian (IG) and lognormal ISI distributions), and find that the time-rescaled renewal process with the IG distribution achieves the largest KL divergence, followed by the lognormal and gamma distributions.", "full_text": "Coding ef\ufb01ciency and detectability of rate \ufb02uctuations\n\nwith non-Poisson neuronal \ufb01ring\n\nDepartment of Statistical Modeling\nThe Institute of Statistical Mathematics\n\n10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan\n\nShinsuke Koyama\u2217\n\nskoyama@ism.ac.jp\n\nAbstract\n\nStatistical features of neuronal spike trains are known to be non-Poisson. Here, we\ninvestigate the extent to which the non-Poissonian feature affects the ef\ufb01ciency of\ntransmitting information on \ufb02uctuating \ufb01ring rates. For this purpose, we introduce\nthe Kullback-Leibler (KL) divergence as a measure of the ef\ufb01ciency of informa-\ntion encoding, and assume that spike trains are generatedby time-rescaled renewal\nprocesses. We show that the KL divergence determines the lower bound of the de-\ngree of rate \ufb02uctuations below which the temporal variation of the \ufb01ring rates is\nundetectablefrom sparse data. We also show that the KL divergence,as well as the\nlower bound, depends not only on the variability of spikes in terms of the coef\ufb01-\ncient of variation, but also signi\ufb01cantly on the higher-order moments of interspike\ninterval (ISI) distributions. We examine three speci\ufb01c models that are commonly\nused for describing the stochastic nature of spikes (the gamma, inverse Gaussian\n(IG) and lognormal ISI distributions), and \ufb01nd that the time-rescaled renewal pro-\ncess with the IG distribution achieves the largest KL divergence, followed by the\nlognormal and gamma distributions.\n\n1 Introduction\nCharacterizing the statistical features of spike time sequences in the brain is important for under-\nstanding how the brain represents information about stimuli or actions in the sequences of spikes.\nAlthough the spike trains recorded from in vivo cortical neurons are known to be highly irregu-\nlar [20, 24], a recent non-stationary analysis has revealed that individual neurons signal with non-\nPoisson \ufb01ring, the characteristics of which are strongly correlated with the function of the cortical\narea [21].\nThis raises the question of what the neural coding advantages of non-Poisson spiking are. It could be\nthat the precise timing of spikes carries additional information about the stimuli or actions [6, 15]. It\nis also possible that the ef\ufb01ciency of transmitting \ufb02uctuatingrates might be enhancedby non-Poisson\n\ufb01ring [5, 17]. Here, we explore the latter possibility.\nIn the problem of estimating \ufb01ring rates, there is a minimum degree of rate \ufb02uctuation below which\na rate estimator cannot detect the temporal variation of the \ufb01ring rate [23]. If, for instance, the degree\nof temporal variation of the rate is on the same order as that of the noise, a constant rate might be\nchosen as the most likely estimate for a given spike train. It is, therefore, interesting to see how the\nminimum degree of rate \ufb02uctuation depends on the non-Poissonian feature of spike trains.\nIn this study, we investigate the extent to which the non-Poissonian feature of spike trains affects\nthe encoding ef\ufb01ciency of rate \ufb02uctuations. In addition, we address the question of how the de-\n\n\u2217http://skoyama.blogspot.jp\n\n1\n\n\ftectability of rate \ufb02uctuations depends on the encoding ef\ufb01ciency. For this purpose, we introduce\nthe Kullback-Leibler (KL) divergence to measure the encoding ef\ufb01ciency, and assume that spike se-\nquences are generated by time-rescaled renewal processes. With the aid of analytical and numerical\nstudies, we suggest that the lower bound of detectable rate \ufb02uctuations, below which the empirical\nBayes decoder cannot detect the rate \ufb02uctuations, is uniquely determined by the KL divergence. By\nexamining three speci\ufb01c models (the time-rescaled renewal process with the gamma, inverse Gaus-\nsian (IG) and lognormal interspike interval (ISI) distributions), it is shown that the KL divergence,\nas well as the lower bound, depends not only on the \ufb01rst- and second-order moments, but also sig-\nni\ufb01cantly on the higher-order moments of the ISI distributions. We also \ufb01nd that among the three\nISI distributions, the IG distribution achieves the highest ef\ufb01ciency of coding information on rate\n\ufb02uctuations.\n\n2 Encoding rate \ufb02uctuations using time-rescaled renewal processes\nDe\ufb01nitions of time-rescaled renewal processes and KL divergence\nWe introduce time-rescaled renewal processes for a model of neuronal spike trains constructed in the\nfollowing way. Let f\u03ba(y) be a family of ISI distributions with the unit mean (i.e.,! \u221e\n0 yf\u03ba(y)dy =\n1), where \u03ba controls the shape of the distribution, and \u03bb(t) be a \ufb02uctuating \ufb01ring rate. A sequence of\nspikes {ti} := {t1, t2, . . . , tn} is generated in the following steps: (i) Derive ISIs {y 1, y2, . . . , yn}\nindependently from f\u03ba(y), and arrange the ISIs sequentially to form a spike train of the unit rate;\nith spike is given by summing the previous ISIs as s i =\"i\nj=1 yj. (ii) Transform {s1, s2, . . . , sn}\nto {t1, t2, . . . , tn} according to ti =\u039b \u22121(si), where \u039b\u22121(si) is the inverse of the function \u039b(t) =\n! t\n0 \u03bb(u)du. This transformationensures that the instantaneous\ufb01ring rate of {t i} correspondsto \u03bb(t),\nwhile the shape of the ISI distribution f \u03ba(y), which characterizes the \ufb01ring irregularity,is unchanged\nin time. This is in agreement with the empirical fact that the degree of irregularity in neuronal \ufb01ring\nis generally maintained in cortical processing [21, 22], while the \ufb01ring rate \u03bb(t) changes in time.\nThe probability density of the occurrence of spikes at {t i} is, then, given by\n\u03bb(ti)f\u03ba(\u039b(ti) \u2212 \u039b(ti\u22121)).\n\np\u03ba({ti}|{\u03bb(t)}) =\n\n(1)\n\nwhere t0 = 0.\nWe next introduce the KL divergence for measuring the encoding ef\ufb01ciency of \ufb02uctuating rates. For\nthis purpose, we assume that \u03bb(t) is ergodic with a stationary distribution p(\u03bb), the mean of which\nis given by \u00b5:\n\n\"\u03bb#\u03bb :=$ \u221e\n\n0\n\n(2)\nConsider a probability density of a renewal process that has the same ISI density f \u03ba(x) and the\nconstant rate \u00b5:\n\n\u03bbp(\u03bb)d\u03bb = lim\nT\u2192\u221e\n\n\u03bb(t)dt = \u00b5.\n\n1\n\nT $ T\n\n0\n\nn#i=1\n\n(3)\n\n(4)\n\nThe KL divergence between p\u03ba({ti}|{\u03bb(t)}) and p\u03ba({ti}|\u00b5) is, then, de\ufb01ned as\n\np\u03ba({ti}|\u00b5) =\n\n\u00b5f\u03ba(\u00b5(ti \u2212 ti\u22121)).\n\nn#n=1\nT $ T\nt1 \u00b7\u00b7\u00b7$ T\n0 $ T\n\u221e%n=0\n\u00d7 log p\u03ba({ti}|{\u03bb(t)})\np\u03ba({ti}|\u00b5)\n\n1\n\ntn\u22121\n\ndt1dt2 \u00b7\u00b7\u00b7 dtn.\n\nD\u03ba(\u03bb(t)||\u00b5)\n\n:= lim\nT\u2192\u221e\n\np\u03ba({ti}|{\u03bb(t)})\n\nSince it is de\ufb01ned as the entropy of a renewal process with the \ufb02uctuating rate \u03bb(t) relative to that\nwith the constant rate \u00b5, D\u03ba(\u03bb(t)||\u00b5) can be interpreted as the amount of information on the rate\n\ufb02uctuations encoded into spike trains. Note that a similar quantity has been introduced in [3], where\nthe quantity was computed only under a Poisson model.\n\n2\n\n\fSubstituting Eqs. (1) and (3) into Eq. (4) and further assuming ergodicity of spike trains, the KL\ndivergence can be expressed as\n\nD\u03ba(\u03bb(t)||\u00b5) = lim\nn\u2192\u221e\n\n1\n\ntn \u2212 t0\n\n1\n\ntn \u2212 t0\n\np\u03ba({ti}|\u00b5)\n\nlog p\u03ba({ti}|{\u03bb(t)})\nn%i=1& log \u03bb(ti) + log f\u03ba(\u039b(ti) \u2212 \u039b(ti\u22121))\n\n= lim\nn\u2192\u221e\n\n\u2212 log \u00b5 \u2212 log f\u03ba(\u00b5(ti \u2212 ti\u22121))\u2019.\n\n(5)\nThis expression can be used for computing the KL divergence numerically by simulating a large\nnumber of spikes n % 1.\nThree ISI distributions and their KL divergence\nIn order to examine the behavior of the KL divergence, we use the three speci\ufb01c ISI distributions\nfor f\u03ba(y) (the gamma, inverse Gaussian (IG) and lognormal distributions), which have been used to\ndescribe the stochastic nature of ISIs [9, 10, 14]. These distributions and their coef\ufb01cient of variation\n\ngamma :\n\n(CV =(V ar(X)/E(X)) are given by\nf\u03ba(y) = \u03ba\u03bay\u03ba\u22121e\u2212\u03bay/\u0393(\u03ba), CV = 1/\u221a\u03ba,\n2\u03c0y3 exp* \u2212\nf\u03ba(y) =) \u03ba\nexp* \u2212\ny\u221a2\u03c0\u03ba\n\n\u03ba(y \u2212 1)2\n(log y + \u03ba\n\nlognormal\n\nf\u03ba(y) =\n\nIG :\n\n2 )2\n\n2\u03ba\n\n2y\n\n1\n\n:\n\n+, CV = 1/\u221a\u03ba,\n+, CV = \u221ae\u03ba \u2212 1,\n\nwhere \u0393(\u03ba) =! \u221e\n\n0 x\u03ba\u22121e\u2212xdx is the gamma function. Figure 1a illustrates the shape of the three\ndistributions with three different values of C V .\nThe KL divergence for the three models is analytically solvable when the rate \ufb02uctuation has a long\ntime scale relative to the mean ISI. Here, we show the derivation for the gamma distribution. (The\nderivations for the IG and lognormal distributions are essentially the same.) Inserting Eq. (6) into\nEq. (5) leads to\n\n(6)\n(7)\n\n(8)\n\n1\n\ntn \u2212 t0\n\nD\u03ba(\u03bb(t)||\u00b5) = lim\nn\u2192\u221e\n\nn%i=1& log \u03bb(ti) + (\u03ba \u2212 1) log[\u039b(ti) \u2212 \u039b(ti\u22121)]\n\u2212 (\u03ba \u2212 1) log(ti \u2212 ti\u22121)\u2019 \u2212 \u03ba\u00b5 log \u00b5,\n(9)\ntn\u2212t0 \u2192 \u00b5 as n \u2192 \u221e. By introducing the \u201caveraged\u201d\nwhere we used\n\u03bb(t)dt \u2192 \u00b5 and\n, we obtain log[\u039b(ti) \u2212 \u039b(ti\u22121)] = log \u00af\u03bbi + log(ti \u2212\n\ufb01ring rate in the ith ISI: \u00af\u03bbi := \u039b(ti)\u2212\u039b(ti\u22121)\nti\u2212ti\u22121\nti\u22121). Assuming that the time scale of the rate \ufb02uctuation is longer than the mean ISI so that \u00af\u03bbi is\napproximated to \u03bb(ti), Eq. (9) becomes\n\ntn\u2212t0! tn\n\nt0\n\nn\n\n1\n\nD\u03ba(\u03bb(t)||\u00b5) = \u03ba lim\nn\u2192\u221e\n= \u03ba, lim\n\nT\u2192\u221e\n\ntn \u2212 t0\n\n1\n\nn%i=1\nT $ T\n0 %i\n\n1\n\nlog \u03bb(ti) \u2212 \u03ba\u00b5 log \u00b5\n\u03b4(t \u2212 ti) log \u03bb(t)dt \u2212 \u00b5 log \u00b5-.\n\n(10)\n\nThe \ufb02uctuation in the apparent spike count is given by the variance to mean ratio as represented\nby the Fano factor [8]. For the renewal process in which ISIs are drawn from a given distribution\nfunction, it is proven that the Fano factor is related to the ISI variability with F \u2248 C 2\nV [4]. Thus, for\na long range time scale in which a serial correlation of spikes is negligible, the spike train in Eq. (10)\ncan be approximated to\n\nn%i=1\n\n\u03b4(t \u2212 ti) \u2248 \u03bb(t) +(\u03bb(t)/\u03ba\u03be(t),\n\n3\n\n(11)\n\n\f0\n\n1\n\nwhere \u03be(t) is a \ufb02uctuating process such that \"\u03be(t)# = 0 and \"\u03be(t)\u03be(t%)# = \u03b4(t \u2212 t%). Using this, the\n\ufb01rst term on the rhs of (10) can be evaluated as\n1\n\n(12)\nwhere the second term on the lhs has vanished due to a property of stochastic integrals. Therefore,\nthe KL divergence of the gamma distribution is obtained as\n\nT $ T\n0 (\u03bb(t)/\u03ba log \u03bb(t)\u03be(t)dt = \"\u03bb log \u03bb#\u03bb,\n\n\u03bb(t) log \u03bb(t)dt + lim\nT\u2192\u221e\n\nT $ T\n\nlim\nT\u2192\u221e\n\n(13)\nIn the same way, the KL divergence for the IG and lognormal distributions are, respectively, derived\nas\n\nD\u03ba(\u03bb(t)||\u00b5) = \u03ba&\"\u03bb log \u03bb#\u03bb \u2212 \u00b5 log \u00b5\u2019.\n2\"\u03bb log \u03bb#\u03bb + \u03ba + 1\n\nD\u03ba(\u03bb(t)||\u00b5) = \u00b5\n2\n\n2\u00b5 \"(\u03bb \u2212 \u00b5)2#\u03bb,\n\nlog \u00b5 \u2212\n\n1\n\nand\n\n(14)\n\n(15)\n\nD\u03ba(\u03bb(t)||\u00b5) = \u00b5\n2\u03ba\n\n(log \u00b5)2 \u2212\n\nlog \u00b5\n\u03ba \"\u03bb log \u03bb#\u03bb +\n\n1\n2\u03ba\"\u03bb(log \u03bb)2#\u03bb.\n\nSee the supplementary material for the details of their derivations.\nResults\nWe compute the KL divergence for the three models, in which the rate \ufb02uctuates according to the\nOrnstein-Uhlenbeck process. Formally, the rate process is given by \u03bb(t) = [x(t)] +, where [\u00b7]+ is\nthe recti\ufb01cation function:\n(16)\n\notherwise\nand x(t) is derived from the Ornstein-Uhlenbeck process:\n\n[x]+ =, x, x > 0\n\n0,\n\ndx(t)\n\ndt\n\nx(t) \u2212 \u00b5\n\n\u03c4\n\n= \u2212\n\n+ \u03c3) 2\n\n\u03c4\n\n\u03be(t),\n\n(17)\n\nwhere \u03be(t) is the Gaussian white noise.\nFigure 1b depicts the KL divergence as a function of \u03c3 for C V =0.6, 1 and 1.5. The analytical results\n(the solid lines) are in good agreementwith the numerical results (the error bars). The KL divergence\nfor the three models increases as \u03c3 is increased and as C V is decreased, which is rather obvious\nsince larger \u03c3 and smaller CV imply lower noise entropy of spike trains. One nontrivial result is\nthat, even if the three models share the same values of \u03c3 and C V , the KL divergence of each model\nsigni\ufb01cantly differs from that of the others: the IG distribution achieves the largest KL divergence,\nfollowed by the lognormal and gamma distributions. The difference in the KL divergence among\nthe three models becomes larger as CV grows larger. Since the three models share the same \ufb01ring\nrate \u03bb(t) and CV , it can be concluded that the higher-order (more than second-order) moments of\nISI distributions strongly affect the KL divergence.\nIn order to con\ufb01rm this result for another rate process, we examine a sinusoidal rate process, \u03bb(t) =\n\u00b5 + \u03c3 sin t/\u03c4, and observe the same behavior as the Ornstein-Uhlenbeck rate process (Figure 1c).\n3 Decoding \ufb02uctuating rates using the empirical Bayes method\nIn this section, we show that the KL divergence (4) determines the lower bound of the degree of rate\n\ufb02uctuation below which the empirical Bayes estimator cannot detect rate \ufb02uctuations.\nThe empirical Bayes method\nWe consider decoding a \ufb02uctuation rate \u03bb(t) from a given spike train {t i} := {t1 . . . , tn} in an\nobservation interval [0, T ] by the empirical Bayes method. Let x(t) \u2208 R be a latent variable that\n\n4\n\n\f(a)\n\n(b)\n0.15\n\ne\nc\nn\ne\ng\nr\ne\nv\nd\n\ni\n\n \n\nL\nK\n\n0.1\n\n0.05\n\nCV=0.6\n\nCV=1\n\nCV=1.5\n\ngamma\nIG\nlognormal\n\nCV=0.6\n\nCV=1\n\nCV=1.5\n\nCV=0.6\n\nCV=1\n\nCV=1.5\n\n(c)\n0.4\n\ne\nc\nn\ne\ng\nr\ne\nv\nd\n\ni\n\n \n\nL\nK\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n0\n\n0\n\n0\n\n0.1\n\n0.2\n\u03c3\n\n0.3\n\n0.2\n\n0.4\n\n\u03c3\n\n0.6\n\n0.8\n\nFigure 1: (a) The gamma (blue), IG (green) and lognormal (red) ISI distribution functions for\nCV =0.6, 1 and 1.5. (b) The KL divergence as a function of \u03c3 for C V =0.6, 1 and 1.5, when the rate\n\ufb02uctuates accordingto the Ornstein-Uhlenbeckprocess (17)with \u00b5 = 1 and \u03c4 = 10. The blue, green\nand red indicate the KL divergence for the gamma, IG and lognormal distribution, respectively. The\nlines represent the theoretical values obtained by Eqs. (13), (14) and (15), and the error bars repre-\nsent the average and standard deviation numerically computed accordingto Eq. (5) with n = 50, 000\nand 10 trials. (c) The KL divergence for the sinusoidally modulated rate, \u03bb(t) = \u00b5 + \u03c3 sin t/\u03c4, with\n\u00b5 = 1 and \u03c4 = 10.\n\nis transformed from \u03bb(t) via the log-link function x(t) = log \u03bb(t). For the inference of \u03bb(t) from\n{ti}, we use a prior distribution of x(t), such that the large gradient of x(t) is controlled by\n\np\u03b3({x(t)}) \u221d exp. \u2212\n\n1\n\n2\u03b32$ T\n\n0 / dx(t)\ndt 02\n\ndt1,\n\n(18)\n\n(19)\n\nwhere the hyperparameter \u03b3 controls the roughness of the latent process x(t): with the small \u03b3, the\nmodel requires a constant latent process, and vice versa. By inverting the conditional probability\ndistribution with the Bayes\u2019 theorem, the posterior distribution of {x(t)} is obtained as\n\np\u03ba,\u03b3({x(t)}|{ti}) = p\u03ba({ti}|{x(t)})p\u03b3({x(t)})\n\n.\n\np\u03ba,\u03b3({ti})\n\nThe hyperparameters, \u03b3 and \u03ba, which represent the roughness of the latent process and the shape of\nthe ISI density function, can be determined by maximizing the marginal likelihood [16] de\ufb01ned by\n(20)\n\np\u03ba,\u03b3({ti}) =$ p\u03ba({ti}|{x(t)})p\u03b3({x(t)})D{x(t)},\n\nwhere! D{x(t)} represents the integration over all possible latent process paths. Under a set of\n\nhyperparameters \u02c6\u03b3 and \u02c6\u03ba that are determined by the marginal likelihood maximization, we can\ndetermine the maximum a posteriori (MAP) estimate of the latent process \u02c6x(t). The method for\nimplementing the empirical Bayes analysis is summarized in the Appendix.\nDetectability of rate \ufb02uctuations\nWe \ufb01rst examine the gamma distribution (6). For synthetic spike trains (n = 1, 000) generated\nby the time-rescaled renewal process with the gamma ISI distribution, in which the rate \ufb02uctuates\naccording to the Ornstein-Uhlenbeck process (17) with \u00b5 = 1 and \u03c4 = 10, we attempt to decode\n\u03bb(t) using the empirical Bayes decoder. Depending on the amplitude of the rate \ufb02uctuation \u03c3 and\nCV of f\u03ba(y), the empirical Bayes decoder provides qualitatively two distinct rate estimations: (I)\na \ufb02uctuating rate estimation (\u02c6\u03b3> 0) for large \u03c3 and small C V , or (II) a constant rate estimation\n(\u02c6\u03b3 = 0) for small \u03c3 and large CV (Figure 2a). When \u03c3 is increased or CV is decreased, the\n\n5\n\n\fempirical Bayes estimator exhibits a phase transition corresponding to the switch of the most likely\nrate estimation from (II) to (I) (Figure 2b). Note that below the critical point of this phase transition,\nthe empirical Bayes method provides a constant rate as the most likely estimation even if the true\nrate process \ufb02uctuates. The critical point, thus, gives the lower bound for the degree of detectable\nrate \ufb02uctuations. It is also con\ufb01rmed, using numerical simulations, that the phase transition occurs\nnot only with the gamma distribution, but also with the IG and lognormal distributions (Figure 2c,d).\nFor the time-rescaled renewal process with the gamma ISI distribution, we could analytically derive\nthe formula that the lower bound satis\ufb01es as:\n\nD\u03ba(\u03bb(t)||\u00b5) =\n\n,\n\n(21)\n\n\u03c6(0)\n0 \u03c6(u)e\u2212\u03b7udu\n\n4 max\u03b7! \u221e\n\n(See supplementary material for the derivation.)\nwhere \u03c6(u) is the correlation function of \u03bb(t).\nEq. (21) is in good agreement with the simulation result for the entire parameter space (the solid line\nin Figure 2a).\nThe expression of Eq. (21) itself does not depend on the gamma distribution. We investigated if this\nformula is also applicable to the IG and lognormal distributions, and found that the theoretical lower\nbounds (the solid lines in Figure 2c,d) indeed do correspond to those obtained by the numerical\nsimulations; this result implies that Eq. (21) is applicable to more general time-rescaled renewal\nprocesses.\nFigure 2e compares the lower bounds among the three distributions. The lower bound of the IG\ndistribution is the lowest, followed by the lognormal and gamma distributions, which is expected\nfrom the result in Figure 1b, as the lower bound is identically determined by the KL divergence via\nEq. (21).\nWe also examined the sinusoidally modulated rate, \u03bb(t) = \u00b5 + \u03c3 sin t/\u03c4; the qualitative result\nremains the same (Figure 2f-h).\n\n4 Discussion\nIn this study, we \ufb01rst examined the extent to which spike trains derived from time-rescaled renewal\nprocesses encode information on \ufb02uctuating rates. The encoding ef\ufb01ciency is measured by the KL\ndivergence between two renewal processes with \ufb02uctuating and constant rates. We showed that the\nKL divergence signi\ufb01cantly differs among the gamma, IG and lognormal ISI distributions, even if\nthese three processes share the same rate \ufb02uctuation \u03bb(t) and C V (Figure 1b). This suggests that the\nhigher-order moments of ISIs play an important role in encoding information on \ufb02uctuating rates.\nAmong the three distributions, the IG distribution achieves the largest KL divergence, followed by\nthe lognormal and gamma distributions. A similar result has been reported for stationary renewal\nprocesses [12].\nSince the KL divergence gives the distance between two probability distributions, Eq. (4) is natu-\nrally related to the ability to discriminate between a \ufb02uctuating rate and a constant rate. In fact,\nthe lower bound of the degree of rate \ufb02uctuation, below which the empirical Bayes decoder cannot\ndiscriminate the underlying \ufb02uctuating rate from a constant rate, satis\ufb01es the formula (21). There\ncommonly exists a lower bound below which the underlying rate \ufb02uctuations are undetectable, not\nonly in the empirical Bayes method with the above prior distribution (18), but also with other prior\ndistributions, and in other rate estimators such as a time-histogram. The lower bound in these meth-\nods has been derived for inhomogeneous Poisson processes as \u03c4\u03c3 2/\u00b5 \u223c O(1), where \u03c4, \u03c3 and \u00b5\nare the time scale, amplitude and mean of the rate \ufb02uctuation, respectively [23]. Thus, Eq. (21),\nor equivalently \u03c4D \u03ba(\u03bb(t)||\u00b5) \u223c O(1) is regarded as a generalization to the non-Poisson processes.\nHere, the crucial step for this generalization is incorporating the KL divergence into the formula.\nNote that the formula (21) was derived analytically under the assumption of the gamma ISI dis-\ntribution, and then was shown to hold for the IG and lognormal ISI distributions with numerical\nsimulations. The analytical tractability of the gamma family lies in the fact that it is the only scale\nfamily that admits the mean as a suf\ufb01cient statistic. We conjecture, from our results with the three\nspeci\ufb01c models, that Eq. (21) is applicable to more general time-rescaled renewal processes (even\nto \u201cnon-renewal\u201d processes), which is open to future research.\n\n6\n\n\f(a)\n\n0.3\n\n\u03c3\n\n0.2\n\n0.1\n\n0\n\n(c)\n\n0.3\n\n\u03c3\n\n(f)\n\n\u03c3\n\n0.2\n\n0.1\n\n0\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n(I)\n\n(II)\n\n(d)\n\n0.3\n\n\u03c3\n\n0.2\n\n0.1\n\n0\n\n(g)\n\n0.6\n\n\u03c3\n\n0.4\n\n0.2\n\n0\n\n\u03b3>0^\n\n\u03b3=0^\n\ngamma\n\n1.5\n\n0.5\n\n1\nCV\n\nIG\n1.5\n\n0.5\n\n1\nCV\n\ngamma\n\n1.5\n\n0.5\n\n1\nCV\n\n(b)\n\n0.1\n\n0.05\n\n\u03b3^\n\n(I)\n\n(II)\n\n0.1\n\n\u03c3\n\n0.2\n\n0.3\n\n0\n\n0\n(e)\n\n0.3\n\n\u03c3\n\n0.2\n\n0.1\n\n0\n\n(h)\n\n0.6\n\n\u03c3\n\n0.4\n\n0.2\n\n0\n\n0.5\n\nlognormal\n1\nCV\n\n1.5\n\nIG\n1.5\n\n0.5\n\n1\nCV\n\n0.5\n\n1\nCV\n\n1.5\n\n0.5\n\nlognormal\n1\nCV\n\n1.5\n\nFigure 2: (a) Left: the phase diagram for sequences generated by the time-rescaled renewal process\nwith the gamma ISI distribution. The ordinate represents the amplitude of rate \ufb02uctuation \u03c3, and\nabscissa represents CV of the gamma ISI distribution. The dots represent the result of numerical\nsimulations in which the empirical Bayes decoder provides a \ufb02uctuating rate estimation (\u02c6\u03b3> 0).\nEach dot is plotted if \u02c6\u03b3> 0 in more than 20 out of 40 identical trials. The solid line represents\nthe theoretical lower bound obtained by the formula (21). Right: raster plots of sample spike trains\nand the estimated rates. The dotted lines and the solid lines represent the underlying rates and the\nestimated rates, respectively. The parameters (CV ,\u03c3 ) of top (\u02c6\u03b3> 0) and bottom (\u02c6\u03b3 = 0) are\n(0.6, 0.3) and (1.5, 0.15), respectively. (b) The optimal hyperparameter \u02c6\u03b3 as a function of \u03c3 for\nCV = 0.6. The solid line represents the theoretical value, and the error bars represent the average\nand standard deviation of \u02c6\u03b3 determined by applying the empirical Bayes algorithm to 40 trials. (c,\nd) The phase diagrams for the IG and lognormal ISI distributions. (e) Comparison of the lower\nbounds among the three models. (f-h) The phase diagrams for the gamma, IG and lognormal ISI\ndistributions, when the rate process is given by \u03bb(t) = \u00b5 + \u03c3 sin t/\u03c4 with \u00b5 = 1 and \u03c4 = 10.\n\nA recent non-stationary analysis has revealed that individual neurons in the cortex signal with non-\nPoisson \ufb01ring, which has empirically been characterized by measures based on the second-order\nmoment of ISIs, such as CV and LV [21, 22]. Our results, however, suggest that it may be important\nto take into account the higher-order moments of ISIs for characterizing \u201cirregularity\u201d of cortical\n\ufb01ring, in order to gain information on \ufb02uctuating \ufb01ring rates. It has also been demonstrated that\nusing non-Poisson spiking models enhances the performance of neural decoding [2, 11, 19]. Our\nresults provide theoretical support for this as well.\n\nAppendix: Implementation of the empirical Bayes method\nDiscretization\nTo construct a practical algorithm for performing empirical Bayes decoding, we \ufb01rst divide the\ntime axis into a set of intervals (ti\u22121, ti] (i = 1, . . . , n). We assume that the \ufb01ring rate within\neach interval (ti\u22121, ti] does not change drastically (which is a reasonable assumption in practice),\nso that it can be approximated to a constant value \u03bb i. Letting Ti = ti \u2212 ti\u22121 be the ith ISI, the\nprobability density of {Ti}\u2261{ T1, T2, . . . , Tn}, given the rate process {\u03bbi}\u2261{ \u03bb1,\u03bb 2 . . . ,\u03bb n}\n\n7\n\n\f1\n\ni=2 p\u03b3(xi|xi\u22121), where\np\u03b3(xi|xi\u22121) =\n\nis obtained from Eq. (1) as p\u03ba({Ti}|{\u03bbi}) = 2n\ni=1 \u03bbif\u03ba(\u03bbiTi). The rate process is linked with\nthe latent process via xi = log \u03bbi. With the same time-discretization, the prior distribution of the\nlatent process {xi}\u2261{ x1, x2, . . . , xn}, which corresponds to Eq. (18), is derived as p \u03b3({xi}) =\np(x1)2n\n(22)\n\n\u03b32(Ti + Ti\u22121)+,\n(xi \u2212 xi\u22121)2\nand p(x1) is the probability density function of the initial latent rate variable.\np({Ti}|{\u03bbi}) and p\u03b3({xi}) de\ufb01ne a discrete-time state space model. We note that this provides a\ngood approximation to the original continuous-time model if the timescale of the rate \ufb02uctuation is\nlarger than the mean ISI.\nEM algorithm\nWe assume that the ISI density function can be rewritten in the form of exponential family distribu-\ntions with respect to the shape parameter \u03ba:\n\n(\u03c0\u03b32(Ti + Ti\u22121)\n\nexp* \u2212\n\np\u03ba(Ti|\u03c6i) := \u03bbif\u03ba(\u03bbiTi) = exp[\u03baS(Ti,\u03c6 i) \u2212 \u03d5(\u03ba) + c(Ti,\u03c6 i)],\n\n(23)\nwith an appropriate parameter representation \u03c6 i = \u03c6(\u03bbi,\u03ba ). Here, \u03ba is the natural parameter of\nthe exponential family and S(T i,\u03c6 i) is its suf\ufb01cient statistic. Suppose that the potential \u03d5(\u03ba) is a\nconvex function. The expectation of S(T i,\u03c6 i) is then given by\n\n\u03b7 =$ S(Ti,\u03c6 i)p\u03ba(Ti|\u03c6i)dTi = d\u03d5(\u03ba)\n\n(24)\nSince \u03d5(\u03ba) is convex, there is one-to-one correspondence between \u03ba and \u03b7, and thus \u03b7 provides\nalternative parametrization to \u03ba [1]. The gamma (6), IG (7) and lognormal (8) distributions are\nincluded in this family.\nWith the parameterization \u03b7, the EM algorithm for the state space model is derived as follows.\nSupposethat we have estimations \u02c6\u03b7(m) and \u02c6\u03b3(m) at the mth iteration. The estimations at the (m+1)th\niteration are given by\n\nd\u03ba\n\n.\n\n(25)\n\nand\n\n\u02c6\u03b7(m+1) =\n\n1\nn\n\nn%i=1\n\"S(Ti,\u03c6 (xi))#(m),\nn%i=2\n2\nn \u2212 1\n\n\"(xi \u2212 xi\u22121)2#(m)\n\n,\n\n\u02c6\u03b32\n(m+1) =\n\nTi + Ti\u22121\n\n(26)\nwhere \" #(m) denotes the expectation with respect to the posterior probability of {x i}, given {Ti},\n\u02c6\u03b7(m) and \u02c6\u03b3(m). The posterior probability is computed by the Laplace approximation, introduced\nbelow. We update \u02c6\u03b7 and \u02c6\u03b3 until the estimations converge. The estimation of \u03ba is then transformed\nfrom \u02c6\u03b7 with Eq. (24).\nLaplace approximation\nWe employ Laplace\u2019s method to compute an approximate posterior distribution of {x i}. Let x =\n(x1, x2, . . . , xn)t be the column vector of the latent process, ( ) t being the transpose of a vector.\nThe MAP estimate of the latent process is obtained by maximizing the log posterior distribution:\n\nn%i=2\n\nn%i=1\n\nl(x) = log p(x1) +\n\nlog p\u03b3(xi|xi\u22121) +\n\nlog p\u03ba(Ti|xi) + const.,\n\n(27)\n\nwith respect to x. We use a diffuse prior for p(x1) so that its contribution vanishes [7].\nIf\np\u03b3(xi|xi\u22121) is log-concave in xi and xi\u22121, and the p\u03ba(Ti|xi) is also log-concave in xi, comput-\ning the MAP estimate is a concave optimization problem [18], which can be solved ef\ufb01ciently by\na Newton method. Due to the Markovian Structure of the state-space model, the Hessian matrix,\nJ(x) \u2261 \u2207\u2207xl(x), becomes a tridiagonal matrix, which allows us to compute the Newton step in\nO(n) time [13]. Let \u02c6x denote the MAP estimation of the posterior probability. The posterior proba-\nbility is then approximated to a Gaussian whose mean vector and covariance matrix are given by \u02c6x\nand \u2212J(\u02c6x)\u22121, respectively.\n\n8\n\n\fAcknowledgments\nThis work was supported by JSPS KAKENHI Grant Number 24700287.\nReferences\n[1] S. Amari and H. Nagaoka. Methods of Information Geometry. Oxford University Press, 2000.\n[2] R. Barbieri, M. C. Quirk, L. M. Frank, M. A. Wilson, and E. N. Brown. Construction and analysis\nof non-poisson stimulus-response models of neural spiking activity. Journal of Neuroscience Methods,\n105:25\u201337, 2001.\n\n[3] N. Brenner, S. P. Strong, R. Koberle, and W. Bialek. Synergy in a neural code. Neural Computation,\n\n12:1531\u20131552, 2000.\n\n[4] D. R. Cox. Renewal Theory. Chapman and Hal, 1962.\n[5] J. P. Cunningham, B. M. Yu, K. V. Shenoy, and M. Sahani. Inferring neural \ufb01ring rates from spike trains\nusing Gaussian processes. In Neural Information Processing Systems, volume 20, pages 329\u2013336, 2008.\n[6] R. M. Davies, G. L. Gerstein, and S. N. Baker. Measurement of time-dependent changes in the irregularity\n\nof neural spiking. Journal of Neurophysiology, 96:906\u2013918, 2006.\n\n[7] J. Durbin and S. J. Koopman. Time Series Analysis by State Space Methods. Oxford University Press,\n\n[8] U. Fano.\n\nIonization yield of radiations. ii. the \ufb02uctuations of the number of ions. Physical Review,\n\n2001.\n\n72:26\u201329, 1947.\n\n1748, 2009.\n\n[9] G. L. Gerstein and B. Mandelbrot. Random walk models for the spike activity of a single neuron. Bio-\n\nphysical Journal, 4:41\u201368, 1964.\n\n[10] S. Ikeda and J. H. Manton. Capacity of a single spiking neuron channel. Neural Computation, 21:1714\u2013\n\n[11] A. L. Jacobs, G. Fridman, R. M. Douglas, N. M. Alam, P. E. Latham, G. T. Prusky, and S. Nirenberg.\nRuling out and ruling in neural codes. Proceedings of the National Academy of Sciences, 106:5936\u20135941,\n2009.\n\n[12] K. Kang and S. Amari. Discrimination with spike times and ISI distributions. Neural Computation,\n\n20:1411\u20131426, 2008.\n\n[13] S. Koyama and L. Paninski. Ef\ufb01cient computation of the maximum a posteriori path and parameter estima-\ntion in integrate-and-\ufb01re and more general state-space models. Journal of Computational Neuroscience,\n29:89\u2013105, 2009.\n\n[14] M. W. Levine. The distribution of the intervals between neural impulses in the maintained discharges of\n\nretinal ganglion cells. Biological Cybernetics, 65:459\u2013467, 1991.\n\n[15] B. N. Lundstrom and A. L. Fairhall. Decoding stimulus variance from a distributional neural code of\n\ninterspike intervals. Journal of Neuroscience, 26:9030\u20139037, 2006.\n\n[16] D. J. C. MacKay. Bayesian interpolation. Neural Computation, 4:415\u2013447, 1992.\n[17] T. Omi and S. Shinomoto. Optimizing time histograms for non-Poisson spike trains. Neural Computation,\n\n23:3125\u20133144, 2011.\n\n[18] L. Paninski. Log-concavity results on gaussian process methods for supervised and unsupervised learning.\n\nIn Neural Information Processing Systems, volume 17, pages 1025\u20131032, 2005.\n\n[19] J. W. Pillow, L. Paninski, V. J. Uzzell, E. P. Simoncelli, and E. J. Chichilnisky. Prediction and decoding\nof retinal ganglion cell responses with a probabilistic spiking model. Journal of Neuroscience, 23:11003\u2013\n11013, 2005.\n\n[20] M. N. Shadlen and W. T. Newsome. The variable discharge of cortical neurons: Implications for connec-\n\ntivity, computation, and information coding. Journal of Neuroscience, 18:3870\u20133896, 1998.\n\n[21] S. Shinomoto, H. Kim, T. Shimokawa, N. Matsuno, S. Funahashi, K. Shima, I. Fujita, H. Tamura, T. Doi,\nK. Kawano, N. Inaba, K. Fukushima, S. Kurkin, K. Kurata, M. Taira, K. Tsutsui, H. Komatsu, T. Ogawa,\nK. Koida, J. Tanji, and K. Toyama. Relating neuronal \ufb01ring patterns to functional differentiation of\ncerebral cortex. PLoS Computational Biology, 5:e1000433, 2009.\n\n[22] S. Shinomoto, K. Shima, and J. Tanji. Differences in spiking patterns among cortical neurons. Neural\n\nComputation, 15:2823\u20132842, 2003.\n\nPhysical Review E, 85:041139, 2012.\n\n[23] T. Shintani and S. Shinomoto. Detection limit for rate \ufb02uctuations in inhomogeneous poisson processes.\n\n[24] W. R. Softky and C. Koch. The highly irregular \ufb01ring of cortical cells is inconsistent with temporal\n\nintegration of random EPSPs. Journal of Neuroscience, 13:334\u2013350, 1993.\n\n9\n\n\f", "award": [], "sourceid": 114, "authors": [{"given_name": "Shinsuke", "family_name": "Koyama", "institution": null}]}