{"title": "Modeling Short-term Noise Dependence of Spike Counts in Macaque Prefrontal Cortex", "book": "Advances in Neural Information Processing Systems", "page_first": 1233, "page_last": 1240, "abstract": "Correlations between spike counts are often used to analyze neural coding. The noise is typically assumed to be Gaussian. Yet, this assumption is often inappropriate, especially for low spike counts. In this study, we present copulas as an alternative approach. With copulas it is possible to use arbitrary marginal distributions such as Poisson or negative binomial that are better suited for modeling noise distributions of spike counts. Furthermore, copulas place a wide range of dependence structures at the disposal and can be used to analyze higher order interactions. We develop a framework to analyze spike count data by means of copulas. Methods for parameter inference based on maximum likelihood estimates and for computation of Shannon entropy are provided. We apply the method to our data recorded from macaque prefrontal cortex. The data analysis leads to three significant findings: (1) copula-based distributions provide better fits than discretized multivariate normal distributions; (2) negative binomial margins fit the data better than Poisson margins; and (3) a dependence model that includes only pairwise interactions overestimates the information entropy by at least 19% compared to the model with higher order interactions.", "full_text": "Modeling Short-term Noise Dependence\n\nof Spike Counts in Macaque Prefrontal Cortex\n\nArno Onken\n\nTechnische Universit\u00a8at Berlin\n\n/ BCCN Berlin\n\nSteffen Gr\u00a8unew\u00a8alder\n\nTechnische Universit\u00a8at Berlin\n\nFranklinstr. 28/29, 10587 Berlin, Germany\n\naonken@cs.tu-berlin.de\n\ngruenew@cs.tu-berlin.de\n\nMatthias Munk\n\nMPI for Biological Cybernetics\n\nSpemannstr. 38, 72076 T\u00a8ubingen, Germany\nmatthias.munk@tuebingen.mpg.de\n\nKlaus Obermayer\n\nTechnische Universit\u00a8at Berlin\n\n/ BCCN Berlin\n\noby@cs.tu-berlin.de\n\nAbstract\n\nCorrelations between spike counts are often used to analyze neural coding. The\nnoise is typically assumed to be Gaussian. Yet, this assumption is often inappro-\npriate, especially for low spike counts. In this study, we present copulas as an\nalternative approach. With copulas it is possible to use arbitrary marginal distri-\nbutions such as Poisson or negative binomial that are better suited for modeling\nnoise distributions of spike counts. Furthermore, copulas place a wide range of\ndependence structures at the disposal and can be used to analyze higher order in-\nteractions. We develop a framework to analyze spike count data by means of cop-\nulas. Methods for parameter inference based on maximum likelihood estimates\nand for computation of mutual information are provided. We apply the method\nto our data recorded from macaque prefrontal cortex. The data analysis leads to\nthree \ufb01ndings: (1) copula-based distributions provide signi\ufb01cantly better \ufb01ts than\ndiscretized multivariate normal distributions; (2) negative binomial margins \ufb01t the\ndata signi\ufb01cantly better than Poisson margins; and (3) the dependence structure\ncarries 12% of the mutual information between stimuli and responses.\n\n1 Introduction\n\nUnderstanding neural coding is at the heart of theoretical neuroscience. Analyzing spike counts of\na population is one way to gain insight into neural coding properties. Even when the same stimulus\nis presented repeatedly, responses from the neurons vary, i.e. from trial to trial responses of neu-\nrons are subject to noise. The noise variations of neighboring neurons are typically correlated (noise\ncorrelations). Due to their relevance for neural coding, noise correlations have been subject of a con-\nsiderable number of studies (see [1] for a review). However, these studies always assumed Gaussian\nnoise. Thus, correlated spike rates were generally modeled by multivariate normal distributions with\na speci\ufb01c covariance matrix that describes all pairwise linear correlations.\n\nFor long time intervals or high \ufb01ring rates, the average number of spikes is suf\ufb01ciently large for the\ncentral limit theorem to apply and thus the normal distribution is a good approximation for the spike\ncount distributions. However, several experimental \ufb01ndings suggest that noise correlations as well\nas sensory information processing predominantly take place on a shorter time scale, on the order of\ntens to hundreds of milliseconds [2, 3]. It is therefore questionable if the normal distribution is still\nan appropriate approximation and if the results of studies based on Gaussian noise apply to short\ntime intervals and low \ufb01ring rates.\n\n\fi\n\ni\n\n]\nn\nB\n/\ns\ne\nk\np\nS\n#\n[\n \n2\nN\n\n(a)\n\n]\n\ni\n\nn\nB\n/\ns\ne\nk\np\nS\n#\n\ni\n\n2\n\n4\n8\nN1 [#Spikes/Bin]\n\n6\n\n10\n\n12\n\n[\n \n\n2\nN\n\n6\n\n4\n\n2\n\n0\n0\n\n6\n\n4\n\n2\n\n0\n0\n\n2\n\n4\n8\nN1 [#Spikes/Bin]\n\n6\n\n10\n\n12\n\n(b)\n\n2\n\n4\n8\nN1 [#Spikes/Bin]\n\n6\n\n10\n\n12\n\n6\n\n4\n\n2\n\n0\n0\n\n]\n\ni\n\nn\nB\n/\ns\ne\nk\np\nS\n#\n\ni\n\n[\n \n\n2\nN\n\n(c)\n\n(d)\n\nFigure 1: (a): Recording of correlated spike trains from two neurons and conversion to spike counts.\n(b): The distributions of the spike counts of a neuron pair from the data described in Section 4 for\n100 ms time bins. Dark squares represent a high number of occurrences of corresponding pairs of\nspike counts. One can see that the spike counts are correlated since the ratios are high near the\ndiagonal. The distributions of the individual spike counts are plotted below and left of the axes.\n(c): Density of a \ufb01t with a bivariate normal distribution.\n(d): Distribution of a \ufb01t with negative\nbinomial margins coupled with the Clayton copula.\n\nThis is due to several major drawbacks of the multivariate normal distribution: (1) Its margins\nare continuous with a symmetric shape, whereas empirical distributions of real spike counts tend\nto have a positive skew, i.e.\nthe mass of the distribution is concentrated at the left of its mode.\nMoreover, the normal distribution allows negative values which are not meaningful for spike counts.\nEspecially for low rates, this can become a major issue, since the probability of negative values\nwill be high. (2) The dependence structure of a multivariate normal distribution is always elliptical,\nwhereas spike counts of short time bins can have a bulb-shaped dependence structure (see Fig. 1b).\n(3) The multivariate normal distribution does not allow higher order correlations of its elements.\nInstead, only pairwise correlations can be modeled.\nIt was shown that pairwise interactions are\nsuf\ufb01cient for retinal ganglion cells and cortex cells in vitro [4]. However, there is evidence that\nthey are insuf\ufb01cient for subsequent cortex areas in vivo [5]. We will show that our data recorded in\nprefrontal cortex suggest that higher order interactions (which involve more than two neurons) do\nplay an important role in the prefrontal cortex as well.\n\nIn this paper, we present a method that addresses the above shortcomings of the multivariate normal\ndistribution. We apply copulas [6] to form multivariate distributions with a rich set of dependence\nstructures and discrete marginal distributions, including the Poisson distribution. Copulas were\npreviously applied to model the distribution of continuous \ufb01rst-spike-latencies [7]. Here we apply\nthis concept to spike counts.\n\n\f2 Copulas\n\nWe give an informal introduction to copulas and apply the concept to a pair of neurons from our data\nwhich are described and fully analyzed in Section 4. Formal details of copulas follow in Section 3.2.\n\nA copula is a cumulative distribution function that can couple arbitrary marginal distributions. There\nare many families of copulas, each with a different dependence structure. Some families have an\nelliptical dependence structure, similar to the multivariate normal distribution. However, it is also\npossible to use completely different dependence structures which are more appropriate for the data\nat hand.\n\nAs an example, consider the modeling of spike count dependencies of two neurons (Fig. 1). Spike\ntrains are recorded from the neurons and transformed to spike counts (Fig. 1a). Counting leads to a\nbivariate empirical distribution (Fig. 1b). The distribution of the counts depends on the length of the\ntime bin that is used to count the spikes, here 100 ms. In the case considered, the correlation at low\ncounts is higher than at high counts. This is called lower tail dependence.\n\nThe density of a typical population model based on the multivariate normal (MVN) distribution\nis shown in Fig. 1c. Here, we did not discretize the distribution since the standard approach to\ninvestigate noise correlations also uses the continuous distribution [1]. The mean and covariance\nmatrix of the MVN distribution correspond to the sample mean and the sample covariances of the\nempirical distribution. Yet, the dependence structure does not re\ufb02ect the true dependence structure\nof the counts. But the spike count probabilities for a copula-based distribution (Fig. 1d) correspond\nwell to the empirical distribution in Fig. 1b.\n\nThe modeling of spike count data with the help of a copula is done in three steps: (1) A marginal\ndistribution, e.g. a Poisson or a negative binomial distribution is chosen, based on the spike count\ndistribution of the individual neurons.\n(2) The counts are transformed to probabilities using the\ncumulative distribution function of the marginal distribution. (3) The probabilities and thereby the\ncumulative marginal distributions are coupled with the help of a so-called copula function. As an\nexample, consider the Clayton copula family [6]. For two variables the copula is given by\n\nC(p1, p2, \u03b1) =\n\n1\n\n+ 1\np\u03b1\n\n2\n\n\u2212 1, 0}\n\n,\n\n\u03b1qmax{ 1\n\np\u03b1\n\n1\n\nwhere pi denotes the probability of the spike count Xi of the ith neuron being lower or equal to\nri (i.e. pi = P (Xi \u2264 ri)). Note that there are generalizations to more than two margins (see\nSection 3.2). The function C(p1, p2, \u03b1) generates a joint cumulative distribution function by cou-\npling the margins and thereby introduces correlations of second and higher order between the spike\ncount variables. The ratio of the joint probability that corresponds to statistically independent spike\ncounts P (X1 \u2264 r1, X2 \u2264 r2) = p1p2 and the dependence introduced by the Clayton copula (for\n\n1\np\u03b1\n\n1\n\n+ 1\np\u03b1\n\n2\n\n\u2212 1 \u2265 0) is given by\n\np1p2\n\nC(p1, p2, \u03b1)\n\n= p1p2\n\n\u03b1s 1\n\np\u03b1\n\n1\n\n+\n\n1\np\u03b1\n\n2\n\n\u2212 1 = \u03b1pp\u03b1\n\n1 + p\u03b1\n\n2 \u2212 p\u03b1\n\n1 p\u03b1\n2 .\n\nSuppose that \u03b1 is positive. Since pi \u2208 [0, 1] the deviation from the ratio 1 will be larger for small\nprobabilities. Thus, the copula generates correlations whose strengths depend on the magnitude of\nthe probabilities. The probability mass function (Fig. 1d) can then be calculated from the cumulative\nprobability using the difference scheme as described in Section 3.4. Care must be taken whenever\ncopulas are applied to form discrete distributions: while for continuous distributions typical mea-\nsures of dependence are determined by the copula function C only, these measures are affected by\nthe shape of the marginal distributions in the discrete case [8].\n\n3 Parametric spike count models and model selection procedure\n\nWe will now describe the formal aspects of the multivariate normal distribution on the one hand and\ncopula-based models as the proposed alternative on the other hand, both in terms of their application\nto spike counts.\n\n\f3.1 The discretized multivariate normal distribution\n\nThe MVN distribution is continuous and needs to be discretized (and recti\ufb01ed) before it can be ap-\nplied to spike count data (which are discrete and non-negative). The cumulative distribution function\n(cdf) of the spike count vector ~X is then given by\n\nF ~X (r1, . . . , rd) =(cid:26)\u03a6\u00b5,\u03a3(\u230ar1\u230b, . . . , \u230ard\u230b),\n\n0,\n\nif \u2200i \u2208 {1, . . . , d} : ri \u2265 0\notherwise\n\nwhere \u230a.\u230b denotes the \ufb02oor operation for the discretization, \u03a6\u00b5,\u03a3 denotes the cdf of the MVN dis-\ntribution with mean \u00b5 and correlation matrix \u03a3, and d denotes the dimension of the multivariate\ndistribution and corresponds to the number of neurons that are modeled. Note that \u00b5 is no longer the\nmean of ~X. The mean is shifted to greater values as \u03a6\u00b5,\u03a3 is recti\ufb01ed (negative values are cut off).\nThis deviation grows with the dimension d. According to the central limit theorem, the distribution\nof spike counts approaches the MVN distribution only for large counts.\n\n3.2 Copula-based models\n\nFormally, a copula C is a cdf with uniform margins.\nFX1, . . . , FXd to form a joint cdf F ~X, such that\n\nIt can be used to couple marginal cdf\u2019s\n\nholds [6]. There are many families of copulas with different dependence shapes and different num-\nbers of parameters, e.g. the multivariate Clayton copula family with a scalar parameter \u03b1:\n\nF ~X (r1, . . . , rd) = C(FX1 (r1), . . . , FXd (rd))\n\nC\u03b1(~u) = max(1 \u2212 d +\n\n, 0)!\u22121/\u03b1\n\n.\n\nu\u2212\u03b1\ni\n\ndXi=1\n\nThus, for a given realization ~r, which can represent the counts of two neurons, we can set ui =\nFXi(ri) and FX (~r) = C\u03b1(~u), where FXi can be arbitrary univariate cdf\u2019s. Thereby, we can generate\na multivariate distribution with speci\ufb01c margins FXi and a dependence structure determined by C.\nIn the case of discrete marginal distributions, however, typical measures of dependence, such as the\nlinear correlation coef\ufb01cient or Kendall\u2019s \u03c4 are effected by the shape of these margins [8]. Note\nthat \u03b1 does not only control the strength of pairwise interactions but also the degree of higher order\ninteractions.\nAnother copula family is the Farlie-Gumbel-Morgenstern (FGM) copula [6]. It is special in that it\nhas 2d \u2212 d \u2212 1 parameters that individually determine the pairwise and higher order interactions. Its\ncdf takes the form\n\nsubject to the constraints\n\nC~\u03b1(~u) =\uf8eb\uf8ed1 +\ndXk=2 X1\u2264j1<\u00b7\u00b7\u00b7<jk\u2264d\n\ndXk=2 X1\u2264j1<\u00b7\u00b7\u00b7<jk\u2264d\nkYi=1\n\n\u03b1j1j2...jk\n\n1 +\n\n\u03b1j1j2...jk\n\n(1 \u2212 uji)\uf8f6\uf8f8\nkYi=1\n\nui\n\ndYi=1\n\n\u03b5ji \u2265 0,\n\n\u03b51, \u03b52, . . . \u03b5d \u2208 {\u22121, 1}.\n\nWe only have pairwise interactions if we set all but the \ufb01rst(cid:0)d\n\neasily investigate the impact of higher order interactions on the model \ufb01t. Due to the constraints for\n\u03b1, the correlations that the FGM copula can model are small in terms of their absolute value. Nev-\nertheless, this is not an issue for modeling noise dependencies of spike counts of a small number of\nneurons, since the noise correlations that are found experimentally are typically small (see e.g. [2]).\n\n2(cid:1) parameters to zero. Hence, we can\n\n3.3 Marginal distributions\n\nCopulas allow us to have different marginal distributions. Typically, the Poisson distribution is a\ngood approximation to spike count variations of single neurons [9]. For this distribution the cdf\u2019s of\nthe margins take the form\n\nFXi(r; \u03bbi) =\n\n\u03bbk\ni\nk!\n\n\u230ar\u230bXk=0\n\ne\u2212\u03bbi,\n\n\fwhere \u03bbi is the mean spike count of neuron i for a given bin size. We will also use the negative\nbinomial distribution as a generalization of the Poisson distribution:\n\nFXi(r; \u03bbi, \u03c5i) =\n\n\u230ar\u230bXk=0\n\n\u03bbk\ni\nk!\n\n1\n(1 + \u03bbi\n\u03c5i\n\n)\u03c5i\n\n\u0393(\u03c5i + k)\n\n\u0393(\u03c5i)(\u03c5i + \u03bbi)k ,\n\nwhere \u0393 is the gamma function. The additional parameter \u03c5i controls the degree of overdispersion:\nthe smaller the value of \u03c5i, the greater the Fano factor. As \u03c5i approaches in\ufb01nity, the negative\nbinomial distribution converges to the Poisson distribution.\n\n3.4\n\nInference for copulas and discrete margins\n\nLikelihoods of discrete vectors can be computed by applying the inclusion-exclusion principle of\nPoincar\u00b4e and Sylvester. For this purpose we de\ufb01ne the sets A = {X1 \u2264 r1, . . . , Xd \u2264 rd} and\nAi = {X1 \u2264 r1, . . . , Xd \u2264 rd, Xi \u2264 ri \u2212 1}, i \u2208 {1, . . . , d}. The probability of a realization ~r is\ngiven by\n\nP ~X (~r) = P A \\\n\nAi! = P (A) \u2212\nd[i=1\ndXk=1\n(\u22121)k\u22121 XI\u2286{1....,d},\ndXk=1\n(\u22121)k\u22121 X~m\u2208{0,1}d ,\n\n|I|=k\n\nP mi=k\n\n= F ~X (~r) \u2212\n\nF ~X (r1 \u2212 m1, . . . , rd \u2212 md).\n\nP \\i\u2208I\n\nAi!\n\n(1)\n\nThus, we can compute the probability mass of a realization ~r using only the cdf of ~X. Since copulas\nseparate the margins from the dependence structure, an ef\ufb01cient inference procedure is feasible. Let\n\nli(\u03b8i) =\n\nlog PXi(ri,t; \u03b8i),\n\ni = 1, . . . , d\n\nTXt=1\n\nTXt=1\n\ndenote the univariate margins of log likelihoods. Note that we assume independent time bins. Fur-\nther, let\n\nl(~\u03b1, \u03b81, . . . , \u03b8d) =\n\nlog P ~X (~rt; ~\u03b1, \u03b81, . . . , \u03b8d)\n\nbe the log likelihood of the joint distribution, where ~\u03b1 denotes the parameter of the copula. The so-\ncalled inference for margins (IFM) method proceeds in two steps [10]. First, the marginal likelihoods\nare maximized separately:\n\nThen, the full likelihood is maximized given the estimated margin parameters:\n\n{li(\u03b8i)}.\n\nb\u03b8i = argmax\n\n\u03b8i\n\nThe estimator is asymptotically ef\ufb01cient and close to the maximum likelihood estimator [10].\n\nb~\u03b1 = argmax\n\n~\u03b1\n\n{l(~\u03b1,b\u03b81, . . . ,b\u03b8d)}.\n\n3.5 Estimation of mutual information\n\nThe mutual information [11] of dependent spike counts ~X is a measure of the information that\nknowing the neural response ~r provides about the stimulus. It can be written as\n\nI( ~X; S) = Xs\u2208MS\n\nPS(s) X~r\u2208Nd\n\nP ~X (~r|s) log2(cid:0)P ~X (~r|s)(cid:1) \u2212 log2  Xs\u2032\u2208MS\n\nPS(s\u2032)P ~X (~r|s\u2032)!!\n\nwhere S is the stimulus random variable, MS is the set of stimuli, and PS is the probability mass\nfunction for the stimuli. The likelihood P ~X (~r|s) of ~r given s can be calculated using Equation 1.\nThereby, I( ~X; S) can be estimated by the Monte Carlo method.\n\n\f4 Application to multi-electrode recordings\n\nWe now apply our parametric count models to the analysis of spike data, which we recorded from\nthe prefrontal cortex of an awake behaving macaque, using a 4 \u00d7 4 tetrode array.\n\nExperimental setup. Activity was recorded while the monkey performed a visual match-to-\nsample-task. The task involved matching of 20 visual stimuli (fruits and vegetables) that were\npresented for approximately 650 ms each. After an initial presentation (\u201csample\u201d) a test stimulus\n(\u201ctest\u201d) was presented with a delay of 3 seconds and the monkey had to decide by differential button\npress whether both stimuli were the same or not. Correct responses were rewarded. Match and\nnon-match trials were randomly presented with an equal probability.\n\nWe recorded from the lateral prefrontal cortex in a 2 \u00d7 2 mm2 area around the ventral bank of\nthe principal sulcus. Recordings were performed simultaneously from up to 16 adjacent sites with\nan array of individually movable \ufb01ber micro-tetrodes (manufactured by Thomas Recording). Data\nwere sampled at 32 kHz and bandpass \ufb01ltered between 0.5 kHz and 10 kHz. Recording positions of\nindividual electrodes were chosen to maximize the recorded activity and the signal quality.\n\nThe recorded data were processed by a PCA based spike sorting method. The method provides\nautomatic cluster cutting which was manually corrected by subsequent cluster merging if indicated\nby quantitative criteria such as the ISI-histograms or amplitude stability.\n\nData set. To select neurons with stimulus speci\ufb01c responses, we calculated spike counts from their\nspike trains. No neuron was accepted in the dependence analysis that shifted its mean \ufb01ring rate\naveraged over the time interval of the sample stimulus presentation by less than 6.5 Hz compared\nto the pre-stimulus interval. A total of six neurons ful\ufb01lled this criterion (each recorded from a\ndifferent tetrode). With this criterion we can assume that the selected neurons are indeed related to\nprocessing of the stimulus information.\n\nSpike trains were separated into 80 groups, one for each of the 20 different stimuli and the four\ntrial intervals: pre-stimulus, sample stimulus presentation, delay, and test stimulus presentation.\nAfterwards, the trains were binned into successive 100 ms intervals and converted to six-dimensional\nspike counts for each bin. Due to the different interval lengths, total sample sizes of the groups were\nbetween 224 and 1793 count vectors. A representative example of the empirical distribution of a\npair of these counts from the stimulus presentation interval is presented in Fig. 1b.\n\nModel \ufb01tting. The discretized MVN distribution as well as several copula-based distributions\nwere \ufb01tted to the data. For each of the 80 groups we selected randomly 50 count vectors (test set)\nfor obtaining an unbiased estimate of the likelihoods. We trained the model on the remainder of\neach group (training set).\n\nA commonly applied criterion for model selection is maximum entropy [4]. This criterion selects\na certain model with minimal complexity subject to given constraints. It thereby performs regular-\nization which is supposed to prevent over\ufb01tting. Copulas on the other hand typically increase the\ncomplexity of the model and thus decrease the entropy. However, our evaluation takes place on a\nseparate test set and hence takes over\ufb01tting into account.\n\nParameter inference for the discretized MVN distribution (see Section 3.1) was performed by com-\nputing the sample mean and sample covariance matrix of the spike counts which is the standard\nprocedure for analyzing noise correlations [1]. Note that this estimator is biased, since it is not the\nmaximum likelihood solution for the discretized distribution.\n\nThe following copula families were used to construct noise distributions of the spike counts. The\nClayton (see Section 3.2), Gumbel-Hougaard, Frank and Ali-Mikhail-Haq copula families as ex-\namples of families with one parameter [6] and the FGM with a variable number of parameters (see\nSection 3.2).\n\nWe applied the IFM method for copula inference (see Section 3.4). The sample mean is the max-\nimum likelihood estimator for \u03bbi for both the Poisson and the negative binomial margins. The\nmaximum likelihood estimates for \u03c5i were computed iteratively by Newton\u2019s method. Depending\non whether the copula parameters were constrained, either the Nelder-Mead simplex method for\n\n\fFigure 2: Evaluation of the IFM estimates on the test set and estimated mutual information. (a): Log\nlikelihoods for the discrete multivariate normal distribution, the best \ufb01tting copula-based model with\nPoisson margins, and the best \ufb01tting copula-based model with negative binomial margins averaged\nover the 20 different stimuli. (b): Difference between the log likelihood of the model with inde-\npendent counts and negative binomial margins (\u201cind. model\u201d) and the log likelihoods of different\ncopula-based models with negative binomial margins averaged over the 20 different stimuli. (c): Mu-\ntual information between stimuli and responses for the Clayton-based model with negative binomial\nmargins. (d): Normalized difference between the mutual information for the Clayton-based model\nwith negative binomial margins and the corresponding \u201cind. model\u201d.\n\nunconstrained nonlinear optimization or the line-search algorithm for constrained nonlinear opti-\nmization was applied to estimate the copula parameters.\n\nResults for different distributions. Fig. 2 shows the evaluation of the IFM estimates on the test\nset. The likelihood for the copula-based models is signi\ufb01cantly larger than for the discrete MVN\nmodel (p = 2 \u00b7 10\u221214, paired-sample Student\u2019s t test over stimuli). Moreover, the likelihood for the\nnegative binomial margins is even larger than that for the Poisson margins (p = 0.0003).\nWe estimated the impact of neglecting higher order interactions on the \ufb01t by using different numbers\n\n2(cid:1) parameters to\n\n2(cid:1) +(cid:0)d\n\nzero, therefore leaving only parameters for pairwise interactions. In contrast, for the 3rd order model\n\nof parameters for the FGM copula. For the 2nd order model we set all but the \ufb01rst(cid:0)d\nwe set all but the \ufb01rst(cid:0)d\n\n3(cid:1) parameters to zero.\n\nWe computed the difference between the likelihood of the model with dependence and the corre-\nsponding model with independence between its counts. Fig. 2b shows this difference for several\ncopulas and negative binomial margins evaluated on the test set. The model based on the Clayton\ncopula family provides the best \ufb01t. The \ufb01t is signi\ufb01cantly better than for the second best \ufb01tting\ncopula family (p = 0.0014). In spite of having more parameters, the FGM copulas perform worse.\nHowever, the FGM model with third order interactions \ufb01ts the data signi\ufb01cantly better than the\nmodel that includes only pairwise interactions (p = 0.0437).\n\nCopula coding analysis. Fig. 2c shows the Monte Carlo estimate of the mutual information based\non the Clayton-based model with negative binomial margins and IFM parameters determined on the\ntraining set for each of the intervals. For the test stimulus interval, the estimation was performed\ntwice: for the previously presented sample stimulus and for the test stimulus. The Monte Carlo\nmethod was terminated when the standard error was below 5 \u00b7 10\u22124. The mutual information is\nhigher during the stimulus presentation intervals than during the delay interval.\n\n\fWe estimated the information increase due to the dependence structure by computing the mutual in-\nformation for the Clayton-based model with negative binomial margins and subtracting the (smaller)\nmutual information for the corresponding distribution with independent elements. Fig. 2d shows\nthis information estimate \u2206Ishuf f led, normalized to the mutual information for the Clayton-based\nmodel. The dependece structure carries up to 12% of the mutual information. During the test\nstimulus interval it carries almost twice as much information about the test stimulus as about the\npreviously presented sample stimulus.\n\nAnother important measure related to stimulus decoding which is currently under debate is \u2206I/I\n[12]. The measure provides an upper bound on the information loss for stimulus decoding based on\nthe distribution that assumes independence. We \ufb01nd that one loses at most 19.82% of the information\nfor the Clayton-based model.\n\n5 Conclusion\n\nWe developed a framework for analyzing the noise dependence of spike counts. Applying this to\nour data from the macaque prefrontal cortex we found that: (1) Gaussian noise is inadequate to\nmodel spike count data for short time intervals; (2) negative binomial distributed margins describe\nthe individual spike counts better than Poisson distributed margins; and (3) higher order interactions\nare present and play a substantial role in terms of model \ufb01t and information content.\n\nThe substantial role of higher order interactions bears a challenge for theoreticians as well as exper-\nimentalists. The complexity of taking all higher order interactions into account grows exponentially\nwith the number of neurons, known as the curse of dimensionality. Based on our \ufb01ndings, we con-\nclude that one needs to deal with this problem to analyze short-term coding in higher cortical areas.\n\nIn summary, one can say that the copula-based approach provides a convenient way to study spike\ncount dependencies for small population sizes (< 20). At present, the approach is computationally\ntoo demanding for higher numbers of neurons. Approximate inference methods might provide a\nsolution to the computational problem and seem worthwhile to investigate. Directions for future re-\nsearch are the exploration of other copula families and the validation of population coding principles\nthat were obtained on the assumption of Gaussian noise.\n\nAcknowledgments. This work was supported by BMBF grant 01GQ0410.\n\nReferences\n[1] B. B. Averbeck and P. E. Latham and A. P. Pouget, Neural correlations, population coding and computation.\n\nNature Review Neuroscience, 7:358\u2013366, 2006.\n\n[2] W. Bair, E. Zohary, and W. T. Newsome, Correlated \ufb01ring in macaque visual area MT: time scales and\n\nrelationship to behavior. Journal of Neuroscience, 21(5):1676\u20131697, 2001.\n\n[3] A. Kohn and M. A. Smith, Stimulus dependence of neuronal correlation in primary visual cortex of the\n\nmacaque. Journal of Neuroscience, 25(14):3661\u20133673, 2005.\n\n[4] E. Schneidman and M. J. Berry II and R. Segev and W. Bialek, Weak pairwise correlations imply strongly\n\ncorrelated network states in a neural population. Nature, 440:1007\u20131012, 2006.\n\n[5] M. M. Michel and R. A. Jacobs, The costs of ignoring high-order correlations in populations of model\n\nneurons. Neural Computation, 18:660\u2013682, 2006.\n\n[6] R. B. Nelsen, An Introduction to Copulas. Springer, New York, second edition, 2006.\n[7] R. L. Jenison and R. A. Reale, The shape of neural dependence. Neural Computation, 16:665\u2013672, 2004.\n[8] C. Genest and J. Neslehova, A primer on discrete copulas. ASTIN Bulletin, 37:475\u2013515, 2007.\n[9] D. J. Tolhurst, J. A. Movshon, and A. F. Dean, The statistical reliability of signals in single neurons in cat\n\nand monkey visual cortex. Vision Research, 23:775\u2013785, 1982.\n\n[10] H. Joe and J. J. Xu, The estimation method of inference functions for margins for multivariate models.\n\nTechnical Report, 166, Department of Statistics, University of British Colombia, 1996.\n\n[11] C. E. Shannon and W. Weaver, The mathematical theory of communication. Urbana: University of Illinois\n\nPress, 1949.\n\n[12] P. E. Latham and S. Nirenberg, Synergy, redundancy, and independence in population codes, revisited.\n\nJournal of Neuroscience, 25(21):5195\u20135206, 2005.\n\n\f", "award": [], "sourceid": 164, "authors": [{"given_name": "Arno", "family_name": "Onken", "institution": null}, {"given_name": "Steffen", "family_name": "Gr\u00fcnew\u00e4lder", "institution": null}, {"given_name": "Matthias", "family_name": "Munk", "institution": null}, {"given_name": "Klaus", "family_name": "Obermayer", "institution": null}]}