{"title": "Neurometric function analysis of population codes", "book": "Advances in Neural Information Processing Systems", "page_first": 90, "page_last": 98, "abstract": "The relative merits of different population coding schemes have mostly been analyzed in the framework of stimulus reconstruction using Fisher Information. Here, we consider the case of stimulus discrimination in a two alternative forced choice paradigm and compute neurometric functions in terms of the minimal discrimination error and the Jensen-Shannon information to study neural population codes. We first explore the relationship between minimum discrimination error, Jensen-Shannon Information and Fisher Information and show that the discrimination framework is more informative about the coding accuracy than Fisher Information as it defines an error for any pair of possible stimuli. In particular, it includes Fisher Information as a special case. Second, we use the framework to study population codes of angular variables. Specifically, we assess the impact of different noise correlations structures on coding accuracy in long versus short decoding time windows. That is, for long time window we use the common Gaussian noise approximation. To address the case of short time windows we analyze the Ising model with identical noise correlation structure. In this way, we provide a new rigorous framework for assessing the functional consequences of noise correlation structures for the representational accuracy of neural population codes that is in particular applicable to short-time population coding.", "full_text": "Neurometric function analysis of population codes\n\nPhilipp Berens, Sebastian Gerwinn, Alexander S. Ecker and Matthias Bethge\n\nMax Planck Institute for Biological Cybernetics\n\nCenter for Integrative Neuroscience, University of T\u00a8ubingen\n\nComputational Vision and Neuroscience Group\nSpemannstrasse 41, 72076, T\u00a8ubingen, Germany\n\nfirst.last@tuebingen.mpg.de\n\nAbstract\n\nThe relative merits of different population coding schemes have mostly been ana-\nlyzed in the framework of stimulus reconstruction using Fisher Information. Here,\nwe consider the case of stimulus discrimination in a two alternative forced choice\nparadigm and compute neurometric functions in terms of the minimal discrimina-\ntion error and the Jensen-Shannon information to study neural population codes.\nWe \ufb01rst explore the relationship between minimum discrimination error, Jensen-\nShannon Information and Fisher Information and show that the discrimination\nframework is more informative about the coding accuracy than Fisher Informa-\ntion as it de\ufb01nes an error for any pair of possible stimuli. In particular, it includes\nFisher Information as a special case. Second, we use the framework to study pop-\nulation codes of angular variables. Speci\ufb01cally, we assess the impact of different\nnoise correlations structures on coding accuracy in long versus short decoding\ntime windows. That is, for long time window we use the common Gaussian noise\napproximation. To address the case of short time windows we analyze the Ising\nmodel with identical noise correlation structure. In this way, we provide a new\nrigorous framework for assessing the functional consequences of noise correla-\ntion structures for the representational accuracy of neural population codes that is\nin particular applicable to short-time population coding.\n\n1 Introduction\n\nThe relative merits of different population coding schemes have mostly been studied (e.g. [1, 12],\nfor a review see [2]) in the framework of stimulus reconstruction (\ufb01gure 1a), where the performance\nof a code is judged on the basis of the mean squared error E[(\u03b8 \u2212 \u02c6\u03b8)2]. That is, if a stimulus \u03b8 is\nencoded by a population of N neurons with tuning curves fi, we ask how well, on average, can an\nestimator reconstruct the true value of the presented stimulus based on the neural responses r, which\nwere generated by the density p(r|\u03b8). The average reconstruction error can be written as\n\nE\u03b8,r[(\u03b8 \u2212 \u02c6\u03b8(r))2] = E\u03b8[Var\u02c6\u03b8|\u03b8] + E\u03b8[b2\n\u03b8].\n\nHere Var\u02c6\u03b8|\u03b8 = Er[(\u03b8 \u2212 \u02c6\u03b8(r))2|\u03b8] denotes the error variance and b\u03b8 = Er[\u02c6\u03b8(r)|\u03b8] \u2212 \u03b8 the bias of the\nestimator \u02c6\u03b8. For the sake of analytical tractability, most studies have employed Fisher Information\n(FI) (e.g. [1, 12])\n\n(cid:29)\n\n(cid:12)(cid:12)(cid:12)(cid:12) \u03b8\n\n(cid:28)\n\nJ\u03b8 =\n\n\u2212 \u22022\n\u2202\u03b82 log p(r|\u03b8)\n\nto bound the conditional error variance Var\u02c6\u03b8|\u03b8 of an unbiased estimator from below according to the\nCramer-Rao bound:\n\nVar\u02c6\u03b8|\u03b8 \u2265 1\nJ\u03b8\n\n.\n\n1\n\n\fFigure 1: Illustration of the two frameworks for studying population codes. a. In stimulus reconstruction, an\nestimator tries to reconstruct the orientation of a stimulus based on a noisy neural response. The quality of a\ncode is based on the average error of this estimator. b. In stimulus discrimination, an ideal observer needs to\nchoose one of two possible stimuli based on a noisy neural response (2AFC task). c. A neurometric function\nshows the error E as a function of \u2206\u03b8, the difference between a reference direction \u03b81 and a second direction\n\u03b82. This framework is often used in psychophysical studies.\n\nFor the comparison of different coding schemes, it is important that an estimator exists which can\nactually attain this lower bound. For short time windows and certain types of tuning functions, this\nmay not always be the case [4]. In particular, it is unclear how different population coding schemes\naffect the \ufb01delity with which a population of binary neurons can encode a stimulus variable.\n\n1.1 A new approach for the analysis of population coding\n\nHere we view the population coding problem from a different perspective: We consider the case of\nstimulus discrimination in a two alternative forced choice paradigm (2AFC, \ufb01gure 1b) with equally\nprobable stimuli and compute two natural measures of coding accuracy: (1) the minimal discrimina-\ntion error E(\u03b81, \u03b82) of an ideal observer classifying a stimulus s based on the response distribution as\neither being \u03b81 or \u03b82 and (2) the Jensen-Shannon information IJS between the response distributions\np(r|\u03b81) and p(r|\u03b82). The minimal discrimination error is achieved by the Bayes optimal classi\ufb01er\n\u02c6\u03b8 = argmaxs p(s|r) where s \u2208 {\u03b81, \u03b82} and the prior distribution p(s) = 1\n\n2. It is given by\n\n(cid:90)\n\n=\n\n1\n2\n\nE(\u03b81, \u03b82) =\n\n(cid:90)\n\nmin (p(s = \u03b81, r), p(s = \u03b82, r)) dr\nmin (p(r|\u03b81), p(r|\u03b82)) dr\n\n(1)\n\nand the Jensen-Shannon Information [13] is de\ufb01ned as\n\n1\n2 DKL [p(r|\u03b81)(cid:107)p(r)] +\n\n1\n2 DKL [p(r|\u03b82)(cid:107)p(r)] ,\n\nIJS(\u03b81, \u03b82) =\ns\u2208\u03b81,\u03b82 p(s)p(r|s) = 1\n\nwhere p(r) = (cid:80)\n(cid:82) q1(x) log q1(x)\n\n(2)\n2(p(r|\u03b81) + p(r|\u03b82)) is the arithmetic average between\nthe two densities, which in our case is the same as the marginal distribution. DKL[q1(cid:107)q2] =\nq2(x)dx is the Kullback-Leibler divergence. IJS is an interesting measure of coding\naccuracy since it directly measures the mutual information between the neural responses and the\n\u2018class label\u2019, i.e. the stimulus identity. By observing a population response pattern r, the uncertainty\n(in terms of entropy) about the stimulus is reduced by\np(r|s) log\n\nMI(r, s) =(cid:88)\n\ndr = IJS,\n\n(cid:80)\np(r|s)\ns p(r|s)p(s)\n\n(cid:90)\n\np(s)\n\ns\n\nwith prior distribution as above. In the following, we will restrict our analysis to the special case\nof shift-invariant population codes for angular variables and compute neurometric functions E(\u2206\u03b8)\nand IJS(\u2206\u03b8) (\ufb01gure 1c) by setting \u03b81 = \u03b8 and \u03b82 = \u03b8 + \u2206\u03b8. In the limit of large populations, the\ndependence of these curves on \u03b8 can be ignored.\n\n2\n\naStimulusNeural ResponseStimulus reconstructionbStimulus discriminationcError\u03b8\u03b812\fFigure 2: a. Illustration of equations 5: The entropy H[E] (black) intersects 1 \u2212 IJS (grey) at E\u2217 (dashed).\nBecause of Fano\u2019s inequality, E > E\u2217. b. Functional form of the bounds in equations 4 and 5 (black). Our lower\nbound is tighter than the lower bound proposed in [13] (grey). c. Illustration of the connections between the\nproposed measures of coding accuracy. Minimal discrimination error E(\u2206\u03b8) (red) is shown as a neurometric\ncurve as a function of \u2206\u03b8 and is bounded in terms of the Jensen-Shannon information IJS(\u2206\u03b8) via equations\n4 and 5 (black). Fisher Information links to E via equation 3 and the bounds imposed by IJS (grey). This\napproximation is only valid for small \u2206\u03b8. The computations have been caried out for a population of N = 50\nneurons, with average correlations \u00af\u03c1 = .15 and correlation structure as in \ufb01gure 3e.\n\n1.2 Computing E and IJS\nWhile the integrals in equation (1) and (2) often cannot be solved, they are relatively easy to evaluate\nnumerically using Monte-Carlo techniques [10]. For the minimal discrimination error, we use\n\n(cid:90)\nM(cid:88)\n\ni=1\n\nE(\u2206\u03b8) =\n\n1\n2\n\u2248 1\n2\n\nmin (p(r|\u03b8), p(r|\u03b8 + \u2206\u03b8)) dr\n\n(cid:16)\n\nmin\n\np(r(i)|\u03b8), p(r(i)|\u03b8 + \u2206\u03b8)\n\n/p(r(i)),\n\n(cid:17)\n\nwhere r(i)\n2 (p(r|\u03b8) + p(r|\u03b8 + \u2206\u03b8)). To approximate IJS, we evaluate each DKL term separately as\n1\n\nis one of M samples,\n\n(cid:90)\n\ndrawn from the mixture distribution p(r) =\np(r|\u03b8) log p(r|\u03b8)\nM(cid:88)\np(r)\n\nlog p(r(i)|\u03b8) \u2212 log p(r(i))\n\ndr\n\nDKL [p(r|\u03b8)(cid:107)p(r)] =\n\n\u2248 1\nM\n\ni=1\n\nfrom p(r(i)|\u03b8).\n\nwhere we draw samples r(i)\nWe use an analogous expression for\nDKL [p(r|\u03b8 + \u2206\u03b8)(cid:107)p(r)] and plug these estimates into equation 2. This scheme provides consis-\ntent estimates of the desired quantities. For all simulations below we used M = 105 samples.\n\n2 Links between the proposed measures\nIn this section, we link the Fisher Information J\u03b8 of a population code p(r|\u03b8) to the minimum dis-\ncrimination error E(\u2206\u03b8) and the Jensen-Shannon Information IJS(\u2206\u03b8) in the 2AFC paradigm. First,\nwe link Fisher Information to Jensen-Shannon information IJS. Second, we bound the minimum dis-\ncrimination error in terms of the Jensen-Shannon information.\n\n2.1 From Fisher Information to Jensen-Shannon Information\nIn order to obtain a relationship between IJS and Fisher Information, we use an expression already\nderived in [7], where p(r|\u03b8 + \u2206\u03b8) is expanded up to second order in \u2206\u03b8, which yields:\n\nIJS(\u2206\u03b8) \u2248 1\n8\n\n(\u2206\u03b8)2J\u03b8.\n\n3\n\n(3)\n\n00.20.40.60.81H[E] (bits)00.20.40.60.810246810 MDEUpper/Lower BoundUpper/LowerBound (FI)00.10.30.5Error00.10.30.5Error00.20.40.60.81abc\u2206 \u03b8 (deg)JS InformationErrorUpper/Lower Bound (equ. 4, 5)Lin\u2019s lower bound\fFigure 3: Illustration of the model. Tuning functions: a. Cosine-type tuning functions with rates between 5\nand 50 Hz. b. Box-like tuning function with matched minimal and maximal \ufb01ring rates. Cosine tuning function\nresembles the orientation tuning functions of many cortical neurons. They are characterized by approximately\nconstant Fisher Information independent of the stimulus orientation. Box-like tuning functions, in contrast,\nhave non-constant Fisher Information due to their steep non-linearity. They have been shown to exhibit superior\nperformance over cosine-like tuning functions with respect to the mean squared error [4]. Correlation matrices:\nc. stimulus-independent, no limited range (SI, \u03b1 = \u221e) , d. stimulus-independent, limited range (SI, \u03b1 = 2),\ne. stimulus-dependent, no limited range (SD, \u03b1 = \u221e), f. stimulus-dependent, limited range (SD, \u03b1 = 2)\n\nTherefore, Fisher Information provides a good approximation of the Jensen-Shannon Information\nfor suf\ufb01ciently small \u2206\u03b8.\n\n2.2 From Jensen-Shannon Information to Minimal Discrimination Error\nThe minimal discrimination error E(\u2206\u03b8) of an ideal observer is bounded from above and below in\nterms of IJS(\u2206\u03b8). An upper bound derived by [13] is given by\n\nE(\u2206\u03b8) \u2264 1\n2\n\n\u2212 1\n2\n\nIJS(\u2206\u03b8).\n\n(4)\n\nNext, we derive a new lower bound on E, which is tighter than a bound derived by Lin [13]. To this\nend, we observe that from Fano\u2019s inequality [8] it follows that\n\nH [E] \u2265 H[s|r] \u2212 E log(|s| \u2212 1)\nH[s|r]\nH[s] \u2212 MI[r, s]\n1 \u2212 IJS(\u2206\u03b8),\n\n=\n=\n=\n\n(5)\n\nwhere H[E] is the entropy of a Bernoulli distribution with p = E. The equality from \ufb01rst to second\nline follows as the number of stimuli or classes |s| = 2. Since the entropy is monotonic in E on the\ninterval [0, 0.5], we have the lower bound E \u2265 E\u2217, where E\u2217 is chosen such that equality holds. For\nan illustration, see \ufb01gure 2a. The shape of both bounds, as well as Lin\u2019s lower bound, are illustrated\nin \ufb01gure 2b.\nIn \ufb01gure 2c we show the minimal discrimination error for a population code (red) together with the\nupper and lower bound (black) obtained by inserting IJS(\u2206\u03b8) into equations 4 and 5. Both bounds\nfollow nicely the neurometric function E(\u2206\u03b8). For comparison, we also show the upper and lower\nbound obtained by plugging Fisher Information into equation 3 and computing the bounds 4 and 5\nbased on this approximation of IJS(\u2206\u03b8) (grey). Clearly, the approximation is valid for small \u2206\u03b8 and\nbecomes successively worse for large ones.\n\n4\n\n01020304050Response (Hz)dacef05010015001020304050Stimulus orientation (deg)Response (Hz)b\fFigure 4: Comparison of box-like (red) vs. cosine (black) tuning functions in short-term population codes of a.\nN = 10 b. N = 50 c. N = 250 independent neurons. Although box-like tuning functions are much broader\nthan cosine tuning functions, Ebox lies usually below Ecos. For the cosine case, FI (dashed, approximation as in\n\ufb01gure 2c and Ed(cid:48) (grey) provide accurate accounts of coding accuracy. In contrast, FI grossly overestimates the\ndiscrimination error for box-like tuning functions in small and medium sized populations. In this case, Ed(cid:48) is\nonly a good approximation of E in the range where \u2206\u03b8 is small (dark red). Beyond this point, it underestimates\nE (a,b). For N = 250, bounds are not shown for clarity but they capture the true beaviour of E better than in\n\ufb01gure 4a and b.\n\n2.3 Previous work\n\nOnly a small number of studies on neural population coding have used other measures than Fisher\nInformation [18, 3, 6, 4]. Two approaches are most closely related to ours: Snippe and Koenderink\n[18] and Averbeck and Lee [3] used a measure analogous to the sensitivity index d(cid:48)\n\n(d(cid:48))2 = \u2206\u00b5\u03a3\u22121\u2206\u00b5\n\u2206\u00b5 := f(\u03b8 + \u2206\u03b8) \u2212 f(\u03b8)\n\n(6)\n\nas a measure of coding accuracy. While Snippe and Koenderink have considered only the limit\n\u2206\u03b8 \u2192 0, Averbeck and Lee evaluated equation 6 for \ufb01nite \u2206\u03b8 using \u03a3 = 1\n2(\u03a3\u03b8 + \u03a3\u03b8+\u2206\u03b8) and\nconverted d(cid:48) to a discrimination error Ed(cid:48) = 1 \u2212 erf(d(cid:48)/2). This approximation is exact only if the\nclass conditional distribution p(r|\u03b8) is Gaussian with \ufb01xed covariance \u03a3\u03b8 = \u03a3 for all \u2206\u03b8. In that\nparticular case, the entire neurometric function is fully determined by the Fisher Information [9]:\n\nd(cid:48) = (\u2206\u03b8)(cid:112)\n\nJ\u03b8 = (\u2206\u03b8) Jmean\n\nJmean is the linear part of the Fisher Information (cf. equation 7). In the general case, it is not obvious\nwhat aspects of the quality of a population code are captured by the above measure. Therefore, both\nFisher Information and the class-conditional second-order approximation used by Averbeck and\nLee have shortcomings: The latter does not account for information originating from changes in\nthe covariance matrix as is quanti\ufb01ed by Jcov (cf. equation 7). Fisher Information, on the other\nhand, can be quite uninformative about the coding accuracy of the population, especially when the\ntuning functions are highly nonlinear (see \ufb01gure 3) or noise is large, as in these cases it is not certain\nwhether the Cramer-Rao bound can actually be attained [4]. The examples studied in the next\nsection demonstrate how these shortcomings can be overcome using the minimal discrimination\nerror (equation 1).\n\n3 Results\n\nAfter describing the population model used in this study, we will illustrate in a simple example, how\nour proposed framework is more informative than previous approaches. Second, we will investigate\nhow different noise correlations structures impact population coding on different timescales.\n\n3.1 The population model\n\nIn this section, we describe in detail the population model used in the remainder of the study. To\nfacilitate comparability, we closely follow the model used in a recent study by Josic et al. [12]\n\n5\n\nabc02040608000.20.40.6\u2206 \u03b8 (deg)Error02040608000.20.40.6\u2206 \u03b8 (deg)Error02040608000.20.40.6\u2206 \u03b8 (deg)ErrorCosineCosine FI boundBoxBox FI boundBox d\u2019Cosine d\u2019\fwhere applicable. We consider a population of N neurons tuned to orientation, where the \ufb01ring rate\nof neuron i follows an average tuning pro\ufb01le fi(\u03b8) with (a) a cosine-like shape\n\nfi(\u03b8) = \u03bb1 + \u03bb2ak(\u03b8 \u2212 \u03c6i)\nwith k = 1 in section 3.2 and k = 6 in section 3.3 and a(\u03c6) = 1\n\n(cid:16)|cos(\u03b8 \u2212 \u03c6i)| 1\n\nfi(\u03b8) =\n\n2(1 + cos(\u03c6)) or (b) a box-like shape\n\nj \u00b7 sgn cos(\u03b8 \u2212 \u03c6i) + 1\n\n+ \u03bb1.\n\n(cid:17) \u00b7 \u03bb2\n\n2\n\nHere, \u03c6i is the preferred orientation of neuron i and we use j = 12. We consider two scenarios:\n\n1. Long-term coding: r(\u03b8) \u223c N (f(\u03b8), \u03a3(\u03b8)), where the trial-to-trial \ufb02uctuations are assumed\n\nto be normally distributed with mean f(\u03b8) and covariance matrix \u03a3(\u03b8).\n\n2. Short-term coding: r(\u03b8) \u223c I (f(\u03b8), \u03a3(\u03b8)), where ri \u2208 {0, 1} and I(\u00b5, \u03a3) is the maximum\nentropy distribution consistent with the constraints provided by \u00b5 and \u03a3, the Ising model\n[16]. That is, for short-term population coding, we assume the population acitivity to be\nbinary with each neuron either emitting one spike or none. The parameters of the Ising\nmodel were computed using gradient descent on the log likelihood.\n\n\u03b4ijvi(\u03b8) + (1 \u2212 \u03b4ij)\u03c1ij(\u03b8)(cid:112)vi(\u03b8)vj(\u03b8), where vi(\u03b8) is the variance of cell i and \u03c1ij(\u03b8) the cor-\n\n[12], we model the stimulus-dependent covariance matrix as \u03a3ij(\u03b8) =\n\nFollowing Josic et al.\n\nrelation coef\ufb01cient. For long-term coding, we set vi(\u03b8) = fi(\u03b8) and for short-term coding, we\nset vi(\u03b8) = fi(\u03b8)(1 \u2212 fi(\u03b8)). We allow for both stimulus and spatial in\ufb02uences on \u03c1 by set-\nting \u03c1ij(\u03b8) = \u03c3ij(\u03b8)c(\u03c6i \u2212 \u03c6j), where \u03c6i is the preferred orientation of neuron i. The func-\ntion s models the in\ufb02uence of the stimulus, while the function c models the spatial component\nof the correlation structure. We use \u03c3ij(\u03b8) = \u03c3i(\u03b8)\u03c3j(\u03b8), where \u03c3i(\u03b8) = \u03ba1 + \u03ba2a2(\u03b8). We set\nc(\u03c6i \u2212 \u03c6j) = C exp (\u2212|\u03c6i \u2212 \u03c6j|/\u03b1), where \u03b1 controls the length of the spatial decay. To obtain a\ndesired mean level of correlation \u00af\u03c1, we use the method described in [12].\n\n3.2 Minimum discrimination error is more informative than Fisher Information\n\nAs has been pointed out in [4], the shape of unimodal tuning functions can strongly in\ufb02uence the\ncoding accuracy of population codes of angular variables. In particular, box-like tuning functions\ncan be superior to cosine tuning functions. However, numerical evaluation of the minimum mean\nsquared error for angular variables is much more dif\ufb01cult than the evaluation of the minimal dis-\ncrimination error proposed here, and the above claim has only been veri\ufb01ed up to N = 20 neurons.\nHere we compute the full neurometric functions for N = 10, 50, 250 binary neurons (\ufb01gure 4). In\nthis way, we show that the advantage of box-like tuning functions also holds for large numbers of\nneurons (compare red and black curves in \ufb01gure 4 a-c). In addition, we note that Fisher Information\ndoes not provide an accurate account of the performance of box-like tuning functions: it fails as soon\nas the nonlinearity in the tuning functions becomes effective and overestimates the true minimal\ndiscrimination error E. Similarly, the approximate neurometric functions Ed(cid:48)(\u2206\u03b8) obtained from\nequation 6 do not capture the shape of neurometric functions E(\u2206\u03b8) but underestimate the minimal\ndiscrimination error. In contrast, the deviation between both curves stays rather small for cosine\ntuning functions.\n\n3.3 Stimulus-dependent correlations have opposite effects for long- and short-term\n\npopulation coding\n\nThe shape of the noise covariance matrix \u03a3\u03b8 can strongly in\ufb02uence the coding \ufb01delity of a neural\npopulation. In order to evaluate these effects it is important to take differences in the noise covariance\nfor different stimuli into account. In this section, we will use our new framework to study different\nnoise correlation structures for short- and long-term population coding.\nPrevious studies so far have investigated the effect of noise correlations in the long-term case: Most\nstudies assumed p(r|\u03b8) to follow a multivariate Gaussian distribution, so that \ufb01ring rates r|\u03b8 \u223c\nN (f(\u03b8), \u03a3(\u03b8)) (for detailed description of the population model see section 3.1). In this case, the\n\n6\n\n\fFigure 5: Neurometric functions E(\u2206\u03b8) (a-c) and IJS(\u2206\u03b8) (d-f) for four different noise correlation structures.\na. and d. Large population (N = 100) and long-term coding. b. and e. Medium sized population (N = 15)\nand long-term coding. The inset is a magni\ufb01cation for clarity. c. and f. Medium sized population (N = 15)\nand short-term coding. The impact of stimulus-dependent noise correlations in the absence of limited range\ncorrelations changes from b/e to c/f (red line). While they are bene\ufb01cial in long-term coding, they are bene\ufb01cial\nin short-term coding only for close angles. The exact point of this transition is not the same for E and IJS, since\nthey are only related via the bounds described in section 2.2. Note that the scale of the x-axis varies.\n\nFI of the population takes a particularly simple form. It can be decomposed into:\n\nJmean = f(cid:48)(cid:62)\u03a3\u22121f(cid:48)\n\n,\n\nJ\u03b8 = Jmean + Jcov\n1\n2\n\nJcov =\n\nTr[\u03a3(cid:48)\u03a3\u22121\u03a3(cid:48)\u03a3\u22121],\n\n(7)\n\nwhere we omit the dependence on \u03b8 for clarity and f(cid:48)\n, \u03a3(cid:48) are the derivatives of f and \u03a3 with respect\nto \u03b8. Jmean, Jcov are the Fisher information, when either only the mean or only the covariance are\nassumed to depend on \u03b8. For this case, various studies have investigated noise structures where\ncorrelations were either uniform across the population (\ufb01gure 3c) or their magnitude decayed with\ndifference in preferred orientations (\ufb01gure 3d), \u2018limited range structure\u2019 or \u2018spatial decay\u2019, see e.g.\n[1]). Only recently have stimulus-dependent correlations been analyzed in terms of Fisher informa-\ntion [12]. Josic et al. \ufb01nd that in the absence of limited range correlations, stimulus-dependent noise\ncorrelations (\ufb01gure 3e) are bene\ufb01cial for a population code, while in their presence (\ufb01gure 3f), they\nare detrimental.\nWe \ufb01rst compute the neurometric functions E(\u2206\u03b8) and IJS(\u2206\u03b8) for a population of 100 neurons\nin the case of long-term coding with a Gaussian noise model for the four possible noise correlation\nstructures (\ufb01gure 5a). We corroborate the results of Josic et al. in that we \ufb01nd that the lowest E or the\nhighest IJS is achieved for a population with stimulus-dependent noise correlations and no limited\nrange structure, while a population with stimulus-dependent noise correlations in the presence of\nspatial decay performs worst. Spatially uniform correlations (\ufb01gure 3c) provide almost as good a\ncode as the best coding scheme.\n\n7\n\n05010000.20.40.60.81\u2206 \u03b8 (deg)0102000.20.40.60.81\u2206 \u03b8 (deg)051000.20.40.60.81\u2206 \u03b8 (deg)Information (bits)05010000.10.20.30.40.5 0102000.10.20.30.40.5051000.10.20.30.40.5 Error SI,\u03b1=infSI,\u03b1=1SD,\u03b1=infSD,\u03b1=1badefc 789100.050.10.15 \fNext, we directly compare long- and short-term population coding in a population of 15 neurons1.\nFor short-term coding, we assume that the population activity is of binary nature, i.e. each neuron\nspikes at most once. Again, we compute neurometric functions E(\u2206\u03b8) and IJS(\u2206\u03b8) for all four\npossible correlation structures. The results for long-term coding do not differ between large and\nsmall populations (\ufb01gure 5b), although relative differences between different coding schemes are\nless prominent. In contrast, we \ufb01nd that the bene\ufb01cial impact of stimulus-dependent correlations in\nthe absence of limited range structure reverses in short-term codes for large \u2206\u03b8 (\ufb01gure 5c).\n\n4 Discussion\n\nIn this paper, we introduce the computation of neurometric functions as a new framework for study-\ning the representational accuracy of neural population codes. Importantly, it allows for a rigorous\ntreatment of nonlinear population codes (e.g. box-like tuning functions) and noise correlations for\nnon-Gaussian noise models. This is particularly important for binary population codes on timescales\nwhere neurons \ufb01re at most one spike. Such codes are of special interest since psychophysical ex-\nperiments have demonstrated that ef\ufb01cient computations can be performed in cortex on short time\nscales [19]. Previous studies have mostly focussed on long-term population codes, since in this case\nit is possible to study many question analytically using Fisher Information. Although the structure\nof neural population acitivity on short timescales has recently attracted much interest [16, 17, 15],\npopulation codes for binary population activity and, in particular, the impact of different noise corre-\nlation structures on such codes are not well understood. In contrast to previous work [14], neuromet-\nric function analysis allows for a comprehensive treatment of both short- and long-term population\ncodes in a single framework. In section 3.3, we have started to study population codes on short\ntimescales and found important differences in the effect of noise correlations between short- and\nlong-term population codes. In the future, we will extend these results to much larger populations\nadapting new techniques for approximate \ufb01tting of Ising models [15].\nThe example discussed in section 3.2 demonstrates that neurometric functions can provide addi-\ntional information compared to Fisher Information: While Fisher Information is a single number for\neach potential population code, neurometric functions in terms of E or IJS assess the coding quality\nfor each pair of stimuli. This also enables us to detect effects like the dependence of the relative\nperformance of different population codes on \u2206\u03b8 as shown in \ufb01gure 5 c and f. We can furthermore\neasily extend the framework to take unequal prior probabilities into account. In equations 1 and 2\n2. Both E and IJS, however, are also\nwe have assumed equal prior probabilities p(\u03b81) = p(\u03b82) = 1\nwell de\ufb01ned if this is not the case.\nThe framework of stimulus discrimination in a 2AFC task has long been used in psychophysical and\nneurophysiological studies for measuring the accuracy of orientation coding in the visual system\n(e.g. [5, 21]). It is therefore appealing to use the same framework also in theoretical investigations\non neural population coding since this facilitates the comparison with experimental data. Further-\nmore, it allows studying population codes for categorial variables since, in contrast to Fisher Infor-\nmation, it does not require the variable of interest to be continuous. This is of advantage, as many\nneurophysiological studies investigate the encoding of categories, such as objects [11] or numbers\n[20].\n\nAcknowledgments\n\nWe thank A. Tolias and J. Cotton for discussions. This work has been supported by the Bernstein\naward to MB (BMBF; FKZ: 01GQ0601) and a scholarship of the German National academic foun-\ndation to PB.\n\n1We are limited in the number of neurons as \ufb01tting the required Ising model is computationally very expen-\n\nsive. For the present purpose, we chose N = 15, which is suf\ufb01cient to demonstrate our point.\n\n8\n\n\fReferences\n[1] L. F. Abbott and Peter Dayan. The effect of correlated variability on the accuracy of a popula-\n\ntion code. Neural Comp., 11(1):91\u2013101, 1999.\n\n[2] B. B. Averbeck, P. E. Latham, and A. Pouget. Neural correlations, population coding and\n\ncomputation. Nat Rev Neurosci, 7(5):358\u2013366, 2006.\n\n[3] B. B. Averbeck and D. Lee. Effects of noise correlations on information encoding and decod-\n\ning. J Neurophysiol, 95(6):3633\u20133644, 2006.\n\n[4] M. Bethge, D. Rotermund, and K. Pawelzik. Optimal Short-Term population coding: When\n\n\ufb01sher information fails. Neural Comp., 14(10):2317\u20132351, 2002.\n\n[5] A. Bradley, B. C. Skottun, I. Ohzawa, G. Sclar, and R. D. Freeman. Visual orientation and\nspatial frequency discrimination: a comparison of single neurons and behavior. J Neurophysiol,\n57(3):755\u2013772, 1987.\n\n[6] N. Brunel and J. P. Nadal. Mutual information, \ufb01sher information, and population coding.\n\nNeural Computation, 10(7):1731\u20131757, 1998.\n\n[7] M. Casas, P. W. Lamberti, A. Plastino, and A. R. Plastino. Jensen-Shannon divergence, \ufb01sher\n\ninformation, and wootters\u2019 hypothesis. Arxiv preprint quant-ph/0407147, 2004.\n\n[8] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, 2006.\n[9] P. Dayan and L. F. Abbott. Theoretical neuroscience: Computational and mathematical mod-\n\neling of neural systems. MIT Press, 2001.\n\n[10] J.R. Hershey and P.A. Olsen. Approximating the kullback leibler divergence between gaus-\nsian mixture models. In Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE\nInternational Conference on, volume 4, pages IV\u2013317\u2013IV\u2013320, 2007.\n\n[11] C. P. Hung, G. Kreiman, T. Poggio, and J. J. DiCarlo. Fast readout of object identity from\n\nmacaque inferior temporal cortex. Science, 310(5749):863\u2013866, 2005.\n\n[12] K. Josic, E. Shea-Brown, B. Doiron, and J. de la Rocha. Stimulus-dependent correlations and\n\npopulation codes. Neural Computation, 21(10):2774\u20132804, 2009.\n\n[13] J. Lin. Divergence measures based on the shannon entropy. Information Theory, IEEE Trans-\n\nactions on, 37(1):145\u2013151, 1991.\n\n[14] S. Panzeri, A. Treves, S. Schultz, and E. T. Rolls. On decoding the responses of a population\n\nof neurons from short time windows. Neural Computation, 11(7):1553\u20131577, 1999.\n\n[15] Y. Roudi, J. Tyrcha, and J. Hertz. The ising model for neural data: Model quality and ap-\nproximate methods for extracting functional connectivity. Phys. Rev. E, 79:051915, February\n2009.\n\n[16] E. Schneidman, M. J. Berry, R. Segev, and W. Bialek. Weak pairwise correlations imply\nstrongly correlated network states in a neural population. Nature, 440(7087):1007\u20131012, 2006.\n[17] J. Shlens, G. D. Field, J. L. Gauthier, M. Greschner, A. Sher, A. M. Litke, and E. J.\nChichilnisky. The structure of Large-Scale synchronized \ufb01ring in primate retina. Journal\nof Neuroscience, 29(15):5022, 2009.\n\n[18] H. Snippe and J. Koenderink.\n\nInformation in channel-coded systems: correlated receivers.\n\nBiological Cybernetics, 67(2):183\u2013190, June 1992.\n\n[19] S. Thorpe, D. Fize, and C. Marlot. Speed of processing in the human visual system. Nature,\n\n381(6582):520\u2013522, 1996.\n\n[20] O. Tudusciuc and A. Nieder. Neuronal population coding of continuous and discrete quantity\nin the primate posterior parietal cortex. Proceedings of the National Academy of Sciences of\nthe United States of America, 104(36):14513\u20138, 2007.\n\n[21] P. Vazquez, M. Cano, and C. Acuna. Discrimination of line orientation in humans and mon-\n\nkeys. J Neurophysiol, 83(5):2639\u20132648, 2000.\n\n9\n\n\f", "award": [], "sourceid": 252, "authors": [{"given_name": "Philipp", "family_name": "Berens", "institution": null}, {"given_name": "Sebastian", "family_name": "Gerwinn", "institution": null}, {"given_name": "Alexander", "family_name": "Ecker", "institution": null}, {"given_name": "Matthias", "family_name": "Bethge", "institution": null}]}