{"title": "Evaluating neuronal codes for inference using Fisher information", "book": "Advances in Neural Information Processing Systems", "page_first": 1993, "page_last": 2001, "abstract": "Many studies have explored the impact of response variability on the quality of sensory codes. The source of this variability is almost always assumed to be intrinsic to the brain. However, when inferring a particular stimulus property, variability associated with other stimulus attributes also effectively act as noise. Here we study the impact of such stimulus-induced response variability for the case of binocular disparity inference. We characterize the response distribution for the binocular energy model in response to random dot stereograms and find it to be very different from the Poisson-like noise usually assumed. We then compute the Fisher information with respect to binocular disparity, present in the monocular inputs to the standard model of early binocular processing, and thereby obtain an upper bound on how much information a model could theoretically extract from them. Then we analyze the information loss incurred by the different ways of combining those inputs to produce a scalar single-neuron response. We find that in the case of depth inference, monocular stimulus variability places a greater limit on the extractable information than intrinsic neuronal noise for typical spike counts. Furthermore, the largest loss of information is incurred by the standard model for position disparity neurons (tuned-excitatory), that are the most ubiquitous in monkey primary visual cortex, while more information from the inputs is preserved in phase-disparity neurons (tuned-near or tuned-far) primarily found in higher cortical regions.", "full_text": "Evaluating neuronal codes for inference using Fisher\n\ninformation\n\nRalf M. Haefner\u2217and Matthias Bethge\n\nCentre for Integrative Neuroscience, University of T\u00a8ubingen,\nBernstein Center for Computational Neuroscience, T\u00a8ubingen,\n\nMax Planck Institute for Biological Cybernetics\n\nSpemannstr. 41, 72076 T\u00a8ubingen, Germany\n\nAbstract\n\nMany studies have explored the impact of response variability on the quality of\nsensory codes. The source of this variability is almost always assumed to be in-\ntrinsic to the brain. However, when inferring a particular stimulus property, vari-\nability associated with other stimulus attributes also effectively act as noise. Here\nwe study the impact of such stimulus-induced response variability for the case of\nbinocular disparity inference. We characterize the response distribution for the\nbinocular energy model in response to random dot stereograms and \ufb01nd it to be\nvery different from the Poisson-like noise usually assumed. We then compute the\nFisher information with respect to binocular disparity, present in the monocular\ninputs to the standard model of early binocular processing, and thereby obtain an\nupper bound on how much information a model could theoretically extract from\nthem. Then we analyze the information loss incurred by the different ways of\ncombining those inputs to produce a scalar single-neuron response. We \ufb01nd that\nin the case of depth inference, monocular stimulus variability places a greater\nlimit on the extractable information than intrinsic neuronal noise for typical spike\ncounts. Furthermore, the largest loss of information is incurred by the standard\nmodel for position disparity neurons (tuned-excitatory), that are the most ubiqui-\ntous in monkey primary visual cortex, while more information from the inputs is\npreserved in phase-disparity neurons (tuned-near or tuned-far) primarily found in\nhigher cortical regions.\n\n1\n\nIntroduction\n\nUnderstanding how the brain performs statistical inference is one of the main problems of theoreti-\ncal neuroscience. In this paper, we propose to apply the tools developed to evaluate the information\ncontent of neuronal codes corrupted by noise to address the question of how well they support sta-\ntistical inference. At the core of our approach lies the interpretation of neuronal response variability\ndue to nuisance stimulus variability as noise.\nMany theoretical and experimental studies have probed the impact of intrinsic response variability on\nthe quality of sensory codes ([1, 12] and references therein). However, most neurons are responsive\nto more than one stimulus attribute. So when trying to infer a particular stimulus property, the\nbrain needs to be able to ignore the effect of confounding attributes that also in\ufb02uence the neuron\u2019s\nresponse. We propose to evaluate the usefulness of a population code for inference over a particular\nparameter by treating the neuronal response variability due to nuisance stimulus attributes as noise\nequivalent to intrinsic noise (e.g. Poisson spiking).\nWe explore the implications of this new approach for the model system of stereo vision where the\ninference task is to extract depth from binocular images. We compute the Fisher information present\n\n\u2217Corresponding author (ralf.haefner@gmail.com)\n\n1\n\n\fFigure 1: Left: Example random dot stereogram (RDS). Right: Illustration of bincular energy model\nwithout (top) and with (bottom) phase disparity.\n\nin the monocular inputs to the standard model of early binocular processing and thereby obtain an\nupper bound on how precisely a model could theoretically extract depth. We compare this with the\namount of information that remains after early visual processing. We distinguish the two principal\nmodel \ufb02avors that have been proposed to explain the physiological \ufb01ndings. We \ufb01nd that one of the\ntwo models appears superior to the other one for inferring depth.\nWe start by giving a brief introduction to the two principal \ufb02avors of the binocular energy model. We\nthen retrace the processing steps and compute the Fisher information with respect to depth inference\nthat is present: \ufb01rst in the monocular inputs, then after binocular combination, and \ufb01nally for the\nresulting tuning curves.\n\n2 Binocular disparity as a model system\n\nStereo vision has the advantage of a clear separation between the relevant stimulus dimension \u2013\nbinocular disparity \u2013 and the confounding or nuisance stimulus attributes \u2013 monocular image struc-\nture ([9]). The challenge in inferring disparity in image pairs consists in distinguishing true from\nfalse matches, regardless of the monocular structures in the two images. The stimulus that tests this\nsystem in the most general way are random dot stereograms (RDS) that consist of nearly identical\ndot patterns in either eye (see Figure 1). The fact that parts of the images are displaced horizontally\nwith respect to each other has been shown to be suf\ufb01cient to give rise to a sensation of depth in\nhumans and monkeys ([5, 4]). Since RDS do not contain any monocular depth cues (e.g. size or\nperspective) the brain needs to correctly match the monocular image features across eyes to compute\ndisparity.\nThe standard model for binocular processing in primary visual cortex (V1) is the binocular energy\nmodel ([5, 10]). It explains the response of disparity-selective V1 neurons by linearly combining the\noutput of monocular simple cells and passing the sum through a squaring nonlinearity (illustrated in\nFigure 1).\n\nL\u03bde\n\nR).\n\n2 + \u03bdo\nL\n\n2 + \u03bde\nR\n\n2 + \u03bdo\nR\n\n2 + 2(\u03bde\n\nL + \u03bde\n\n2 + \u03bdo\nL\n\n2 + \u03bdo\nR\n\n2 + \u03bde\nR\n\nR + \u03bdo\n\nL\u03bdo\n\nL + \u03bdo\n\nR)2 = \u03bde\nL\n\nR)2 + (\u03bdo\n\nreven = (\u03bde\nL is the output of an even-symmetric receptive \ufb01eld (RF) applied to the left image, \u03bdo\n\n(1)\nwhere \u03bde\nR is\nthe output of an odd-symmetric receptive \ufb01eld (RF) applied to the right image, etc. By pairing\nan even and an odd-symmetric RF in each eye1, the monocular part of the response of the cell\n2 becomes invariant to the monocular phase of a grating stimulus (since\n\u03bde\nL\nsin2 + cos2 = 1) and the binocular part is modulated only by the difference (or disparity) between\nthe phases in left and right grating \u2013 as observed for complex cells in V1. The disparity tuning curve\nresulting from the combination in equation (1) is even-symmetric (illustrated in Figure 1 in blue)\nand is one of two primary types of tuning curves found in cortex ([5]). In order to model the other,\nodd-symmetric type of tuning curves (Figure 1 in red), the \ufb01lter outputs are combined such that the\noutput of an even-symmetric \ufb01lter is always combined with that of an odd-symmetric one in the\nother eye:\n\nrodd = (\u03bde\n\nL + \u03bdo\n\nR)2 + (\u03bdo\n\nL + \u03bde\n\nR)2 = \u03bde\nL\n\n2 + \u03bdo\nL\n\n2 + \u03bde\nR\n\n2 + \u03bdo\nR\n\n2 + 2(\u03bde\n\nL\u03bdo\n\nR + \u03bdo\n\nL\u03bde\n\nR).\n\n(2)\n\n1WLOG we assume the quadrature pair to consist of a purely even and a purely odd RF.\n\n2\n\nLeft RFRight RFLeft imageRight imagedisparityresponsedisparityresponseTuning curve\fNote that the two models are identical in their monocular inputs and the monocular part of their\noutput (the \ufb01rst four terms in equations 1 and 2) and only vary in their binocular interaction terms\n(in brackets). The only way in which the \ufb01rst model can implement preferred disparities other than\nzero is by a positional displacement of the RFs in the two eyes with respect to each other (the\ndisparity tuning curve achieves its maximum when the disparity in the image matches the disparity\nbetween the RFs). The second model, on the other hand achieves non-zero preferred disparities by\nemploying a phase shift between the left and right RF (90 deg in our case). It is therefore considered\nto be phase-disparity model, while the \ufb01rst one is called a position disparity one.2\n\n3 Results\n\nHow much information the response of a neuron carries about a particular stimulus attribute depends\nboth on the sensitivity of the response to changes in that attribute and to the variability (or uncer-\ntainty) in the response across all stimuli while keeping that attribute \ufb01xed. Fisher information is the\nstandard way to quantify this intuition in the context of intrinsic noise ([6], but also see [2]) and we\nwill use it to evaluate the binocular energy model mechanisms with regard to their ability to extract\nthe disparity information contained in the monocular inputs arriving at the eyes.\n\n3.1 Response variability\n\nFigure 2 shows the mean of the binocular response of the two models.\nThe variation of the response around the mean due to the variation in\nmonocular image structure in the RDS is shown in Figure 3 (top row)\nfor four exemplary disparities: \u22121, 0, 1 and uncorrelated (\u00b1\u221e), indi-\ncated in Figure 2. Unlike in the commonly assumed case of intrinsic\nnoise, pbinoc(r|d) \u2013 the stimulus-conditioned response distribution \u2013\nis far from Poisson or Gaussian. Interestingly, its mode is always at\nzero \u2013 the average response to uncorrelated stimuli \u2013 and the fact that\nthe mean depends on the stimulus disparity is primarily due to the\ndisparity-dependence of the skew of the response distribution (Figure\n3).3 The skew in turn depends on the disparity through the disparity-\ndependent correlation between the RF outputs as illustrated in Figure\n3 (bottom row). Of particular interest are the response distributions\nat the zero disparity 4, the disparities at which rodd takes its minimum and maximum, respectively,\nand the uncorrelated case (in\ufb01nite disparity). In the case of in\ufb01nite disparity, the images in the two\neyes are completely independent of each other and hence the outputs of the left and right RFs are\nindependent Gaussians. Therefore, \u03bdL\u03bdR \u223c pbinoc(r|d = \u221e) is symmetric around 0. In the case of\nzero disparity (identical images in left and right eye), the correlation is 1 between the outputs of left\nand right RFs (both even, or both odd). It follows that \u03bdL\u03bdR \u223c \u03c72\n1 and hence has a mean of 1. What\nis also apparent is that the binocular energy model with phase disparity (where each even-symmetric\nRF is paired with an odd-symmetric one) never achieves perfect correlation between the left and\nright eye and only covers smaller values.\n\nFigure 2:\nre-\nsponses for even (blue) and\nodd (red) model.\n\nBinocular\n\n3.2 Fisher information\n\n3.2.1 Fisher information contained in monocular inputs\n\nL, \u03bde\n\nR, \u03bdo\n\nFirst, we quantify the information contained in the inputs to the energy model by using Fisher in-\nformation. Consider the 4D space spanned by the outputs of the four RFs in left and right eye:\nR). Since the \u03bd are drawn from identical Gaussians5, the mean responses of the\n(\u03bde\nL, \u03bdo\n2We use position disparity model and even-symmetric tuning interchangeably, as well as phase disparity\nmodel and odd-symmetric tuning. Unfortunately, the term disparity is used for both disparities between the\nRFs, and for disparities between left and right images (in the stimulus). If not indicated otherwise, we will\nalways refer to stimulus disparity for the rest of the paper.\n3The RF outputs are Normally distributed in the limit of in\ufb01nitely many dots (RFs act as linear \ufb01lters +\ncentral limit theorem). Therefore the disparity-conditioned responses p(r|d) correspond to the off-diagonal\nterms in a Wishart distribution, marginalized over all the other matrix elements.\n\n4WLOG we assume the displacement between the RF centers in the left and right eye to be zero.\n5The model RFs have been normalized by their variance, such that var[\u03bd] = 1 and \u03bd \u223c N (0, 1).\n\n3\n\ndresponse-101\fL\u03bde\n\nR (blue) and \u03bde\n\nFigure 3: Response distributions p(r|d) for varying d. Top row: histograms for values of interaction\nR (red). Bottom row: distribution of corresponding RF outputs \u03bdL vs\nterms \u03bde\nR) colors refer\n\u03bdR. 1\u03c3 curves are shown to indicate correlations. Blue (\u03bde\nto the model with even-symmetric tuning curve and odd-symmetric tuning curve, respectively. The\ndisparity value for each column is \u00b1\u221e,\u22121, 0 and 1 corresponding to those highlighted in Figure 2.\n\nR) and red (\u03bde\n\nL vs \u03bdo\n\nL vs \u03bde\n\nL\u03bdo\n\n\uf8f6\uf8f7\uf8f8\n\n\uf8eb\uf8ec\uf8ed 1\n\n0\n\nmonocular inputs do not depend on the stimulus and hence, the Fisher information is given by\nI(d) = 1\n\n2 tr(C\u22121C(cid:48)C\u22121C(cid:48)) where C is the covariance matrix belonging to (\u03bde\n\nR, \u03bdo\n\nL, \u03bdo\n\nL, \u03bde\n\nR):\n\n0\n1\n\nC =\n\na(d)\nc(d)\n\na(d)\nc(d)\nc(\u2212d) a(d)\n0\n1\nR(cid:105) and c(d) := (cid:104)\u03bde\n\nc(\u2212d)\na(d)\nR(cid:105) as Gabor\nwhere we model the interaction terms a(d) := (cid:104)\u03bde\nfunctions6 since Gabors functions have been shown to provide a good \ufb01t to the range of RF shapes\nand disparity tuning curves that are empirically observed in early sensory cortex ([5]).7 a(d) and\nc(d) are illustrated by the blue and red curves in Figure 2, respectively. Because the binocular part\nof the energy model response, or disparity tuning curve, is the convolution of the left and right RFs,\nthe phase of the Gabor describing the disparity tuning curve is given by the difference between the\nphases of the corresponding RFs. Therefore c(d) is odd-symmetric and c(\u2212d) = \u2212c(d). We obtain\n\n1\n0\nR(cid:105) = (cid:104)\u03bdo\n\nL\u03bdo\n\nL\u03bdo\n\nL\u03bde\n\n(cid:2)(1 + a2 \u2212 c2)a(cid:48)2 + (1 + c2 \u2212 a2)c(cid:48)2 + 4aca(cid:48)c(cid:48)(cid:3)\n\nIinputs(d) =\n\n2\n\n(1 \u2212 a2 \u2212 c2)2\n\n(3)\n\nwhere we omitted the stimulus dependence of a(d) and c(d) for clarity of exposition and where (cid:48)\ndenotes the 1st derivative with respect to the stimulus d. The denominator of equation (3)) is given\nby det C and corresponds to the Gaussian envelope of the Gabor functions for a(d) and c(d):\n\ndet C = 1 \u2212 a2 \u2212 c2 = 1 \u2212 exp(\u2212 s2\n\n\u03c32 ).\n\nIn Figure 4B (black) we plot the Fisher information as a function of disparity. We \ufb01nd that the Fisher\ninformation available in the inputs diverges at zero disparity (at the difference between the centers\nof the left and right RFs in general). This means that the ability to discriminate zero disparity from\n\n6A Gabor function is de\ufb01ned as cos(2\u03c0f d \u2212 \u03c6) exp[\u2212 (d\u2212d0)2\n\n2\u03c32\n\n] were f is spatial frequency, d is disparity,\n\n\u03c6 is the Gabor phase, do is the envelope center (set to zero here, WLOG) and \u03c3 the envelope bandwidth.\n\n7The assumption that the binocular interaction can be modeled by a Gabor is not important for the principal\nresults of this paper. In fact, the formulas for the Fisher information in the monocular inputs and in the disparity\ntuning curves derived below hold for other (reasonable) choices for a(d) and c(d) as well.\n\n4\n\n\u221210101\u221210101\u221210101\u221210101\u2212101\u2212101\u2212101\u2212101\u2212101\u2212101\u2212101\u2212101\f(cid:18)\n(cid:18) 2 + 2a\n\n(cid:104)(\u03bde\n\n(cid:104)(\u03bde\nL + \u03bde\n\nR)2(cid:105)\n(cid:19)\nR)(cid:105)\nL + \u03bdo\n\nL + \u03bde\nR)(\u03bdo\n0\n\n(cid:19)\n\n(cid:104)(\u03bde\n\nL + \u03bde\n(cid:104)(\u03bdo\n\nR)(\u03bdo\nL + \u03bdo\n\nR)(cid:105)\nL + \u03bdo\nR)2(cid:105)\n\nCeven =\n\n=\n\nFigure 4: A: Disparity tuning curves for the model using position disparity (even) and phase disparity\n(odd) in blue and red, respectively. B: Black: Fisher information contained in the monocular inputs.\nBlue: Fisher information left after combining inputs from left and right eye according to position\ndisparity model. Red: Fisher information after combining inputs using phase disparity model. Note\nthat the black and red curves diverge at zero disparity. C: Fisher information for the \ufb01nal model\noutput/neuronal response. Same color code as previously. Solid lines correspond to complex, dashed\nlines to simple cells. D: Same as C but with added Gaussian noise in the monocular inputs.\n\nnearby disparities is arbitrarily good. In reality, intrinsic neuronal variability will limit the Fisher\ninformation at zero.8\n\n3.2.2 Combination of left and right inputs\n\nNext we analyze the information that remains after linearly combining the monocular inputs in the\nenergy model. It follows that the 4-dimensional monocular input space is reduced to a 2-dimensional\nR), respectively.\nbinocular one for each model, sampled by (\u03bde\nAgain, the marginal distributions are Gaussians with zero mean independent of stimulus disparity.\nThis means that we can compute the Fisher information for the position disparity model from the\ncovariance matrix C as above:\n\nR) and (\u03bde\n\nL + \u03bdo\n\nL + \u03bdo\n\nL + \u03bde\n\nL + \u03bde\n\nR, \u03bdo\n\nR, \u03bdo\n\nHere we exploited that (cid:104)\u03bde\n(cid:104)\u03bde\n\nR(cid:105) = \u2212(cid:104)\u03bdo\n\nR\u03bde\n\nL\u03bdo\n\nL\u03bdo\n\nL(cid:105). The Fisher information follows as\n\n0\nL(cid:105) = (cid:104)\u03bde\n\n2 + 2a\nR(cid:105) = 0 since the even and odd RFs are orthogonal and that\n\nR\u03bdo\n\nIeven(d) =\n\na(cid:48)(d)2\n\n[1 + a(d)]2 .\n\n(4)\n\nThe dependence of Fisher information on d is shown in Figure 4B (blue). The total information\n(as measured by integrating Fisher information over all disparities) communicated by the position-\ndisparity model is greatly reduced compared to the total Fisher information present in the inputs.\na(d) is an even-symmetric Gabor (illustrated in Figure 2) and hence the Fisher information is great-\nest on either side of the maximum where the slopes of a(d) are steepest, and zero at the center\nwhere a(d) has its peak. We note here that the Fisher information for the \ufb01nal tuning curve for\nthe position-disparity model is the same as in equation (4) and therefore we will postpone a more\ndetailed discussion of it until section 3.2.3.\n\n8E.g. additive Gaussian noise with variance \u03c3N2 on the monocular \ufb01lter outputs eliminates the singularity:\n\ndet C = 1 + \u03c3N2 \u2212 a2 \u2212 c2 \u2265 \u03c3N2.\n\n5\n\nABCDdisparity ddisparity ddisparity ddisparity d\u22124\u2212202400.511.52\u22124\u22122024050100\u22124\u22122024050100\u22124\u2212202400.10.20.30.4\fOn the other hand, when combining the monocular inputs according to the phase disparity model,\nwe \ufb01nd:\n\n(cid:18)\n(cid:18) 2 + 2c\n\n(cid:104)(\u03bde\n\n(cid:104)(\u03bde\nL + \u03bdo\n\n2a\n\nR)(cid:105)\n\nR)2(cid:105)\n(cid:19)\nL + \u03bde\nR(cid:105) = \u2212(cid:104)\u03bdo\n\nL + \u03bdo\nR)(\u03bdo\n2a\n2 \u2212 2c\nL\u03bdo\n\nCodd =\n\n=\nL(cid:105) = (cid:104)\u03bde\n\n(cid:19)\n\n(cid:104)(\u03bde\n\nL + \u03bdo\n(cid:104)(\u03bdo\n\nR)(\u03bdo\nL + \u03bde\n\nL + \u03bde\nR)2(cid:105)\n\nR)(cid:105)\n\nR\u03bde\n\nL(cid:105) = c. The Fisher information in this case\n\n(cid:2)(1 + a2 \u2212 c2)a(cid:48)2 + (1 + c2 \u2212 a2)c(cid:48)2 + 4aca(cid:48)c(cid:48)(cid:3)\n\nsince again (cid:104)\u03bde\nfollows as\n\nL\u03bdo\n\nR(cid:105) = 0 and (cid:104)\u03bde\n\nR\u03bdo\n\nIodd(d) =\n\n=\n\n1\n\n(1 \u2212 a2 \u2212 c2)2\nIinputs(d)\n1\n2\n\nIodd(d) is shown in Figure 4B (red). While loosing 50% of the Fisher information present in the\ninputs, the Fisher information after combining left and right RF outputs is much larger in this case\nthan for the position disparity model explored above. How can that be? Why are the two ways of\ncombining the monocular outputs not symmetric? Insight into this question can be gained by looking\nat the binocular interaction terms in the quadratic expansion of the feature space for the two models.9\nR) of\nFor the position disparity model we obtain the 3-dimensional space (\u03bde\nR, \u03bde\nL\u03bdo\nL\u03bdo\nR, \u03bdo\nR = 0. In\nwhich the third dimension cannot contribute to the Fisher information since \u03bde\nR + \u03bdo\nL\u03bdo\nthe phase-disparity model, however, the quadratic expansion yields (\u03bde\nR).\nR, \u03bdo\nL\u03bde\nR, \u03bde\nL\u03bde\nL\u03bdo\nHere, all three dimensions are linearly independent (although correlated), each contributing to the\nFisher information. This can also explain why Iodd(d) is symmetric around zero, and independent\nof the Gabor phase of c(d). While this is not a rigorous analysis yet of the differences between the\nmodels at the stage of binocular combination, it serves as a starting point for a future investigation.\n\nR + \u03bdo\nL\u03bde\nR + \u03bdo\n\nL\u03bdo\n\nL\u03bde\n\nL\u03bde\n\n3.2.3 Disparity tuning curves\n\nIn order to collapse the 2-dimensional binocular inputs into a scalar output that can be coded in the\nspike rate of a neuron, the energy model postulates a squaring output nonlinearity after each linear\ncombination and summing the results. Since the (\u03bdL + \u03bdR)2 are not Normally distributed and their\nmeans depend on the stimulus disparity, we cannot employ the above approach to calculate Fisher\ninformation but instead use the more general\n\n(cid:34)(cid:18) \u2202\n\n\u2202d\n\nI(d) = E\n\n(cid:19)2(cid:35)\n\n(cid:90) \u221e\n\n0\n\n(cid:18) \u2202\n\n\u2202d\n\nln p(r; d)\n\n=\n\np(r; d)\n\nln p(r; d)\n\ndr\n\n(5)\n\n(cid:19)2\n\nwhere p(r; d) is the response distribution for stimulus disparity d. Because the \u03bd are drawn from a\nR are drawn from N [0, 2(1 + a(d))] since we de\ufb01ned\nGaussian with variance 1, \u03bde\na(d) = (cid:104)\u03bde\nR)2 are independent and it\nR)2 and (\u03bdo\nfollows for the model with an even-symmetric tuning function that\n\nR(cid:105). Conditioned on d, (\u03bde\n\nR(cid:105) = (cid:104)\u03bdo\n\nR and \u03bdo\n\nL + \u03bdo\n\nL + \u03bde\n\nL + \u03bdo\n\nL + \u03bde\n\nL\u03bdo\n\nL\u03bde\n\n(cid:2)(\u03bde\n\n1\n\n2[1 + a(d)]\n\nL + \u03bde\n\nR)2 + (\u03bdo\n\nL + \u03bdo\n\nR)2(cid:3) \u223c \u03c72\n\n2\n\n(cid:27)\n\nr\n\nand\n\n1\n\n(cid:26)\n\n\u2212\n\nexp\n\n(cid:21)2\n\n(cid:26)\n\n4[1 + a(d)]\n\n4[1 + a(d)]\n\ndr\n\nr\n\n4[1 + a(d)]\n\n\u2212 1\n\nexp\n\n\u2212\n\nr\n\n4[1 + a(d)]\n\npeven(r; d) =\n\n(cid:90) \u221e\n\n(cid:20)\n\nH(r)\n\n(6)\n\n(cid:27)\n\n(7)\n\nI complex\n\neven\n\n(d) =\n\n=\n\na(cid:48)(d)2\n\n0\n\n4[1 + a(d)]3\n\na(cid:48)(d)2\n\n[1 + a(d)]2\n\nwhere H(r) is the Heaviside step function.10 Substituting equation (6) into equation (5) we \ufb01nd11\n\n1 , f 2\n\nto a 3-dimensional one (f 2\n\n9By quadratic expansion of the feature space we refer to expanding a 2-dimensional feature space (f1, f2)\n10We see that (cid:104)r(cid:105)peven(r;d) = 4[1 + a(d)] and hence we recover the Gabor-shaped tuning function that we\n11(cid:82) \u221e\nintroduced in section 3.2.1 to model the empirically observed relationship between disparity d and mean spike\nrate r.\n0 dx (x/\u03b1 \u2212 1)2 exp(\u2212x/\u03b1) = \u03b1 for \u03b1 > 0.\n\n2 , f1f2) by considering the binocular interaction terms in all quadratic forms.\n\n6\n\n\fRemarkably, this is exactly the same amount of information that is available after summing left and\nright RFs (see equation 4), so none is lost after squaring and combining the quadrature pair. We show\nIeven(d) in Figure 4C (blue). It is also interesting to note that the general form for Ieven(d) differs\nfrom the Fisher information based on the Poisson noise model (and ignoring stimulus variability as\nconsidered here) only by the exponent of 2 in the denominator. Since 1 + a(d) \u2265 0 this means\nthat the qualitative dependence of I on d is the same, the main difference being that the Fisher\ninformation favors small over large spike rates even more. Conversely, it follows that when Fisher\ninformation only takes the neuronal noise into consideration, it greatly overestimates the information\nthat the neuron carries with respect to the to-be-inferred stimulus parameter for realistic spike counts\n(of greater than two). Furthermore, unlike in the Poisson case, a scaling up of the tuning function\n1 + a(d) does not translate into greater Fisher information. Fisher information with respect to\nstimulus variability as considered here is invariant to the absolute height of the tuning curve.12\nR)2 are drawn from N [0, 2(1+c(d))]\nConsidering the phase-disparity model, (\u03bde\nand N [0, 2(1 + c(d))], respectively, since c(d) = (cid:104)\u03bde\nR(cid:105). Unfortunately, since \u03bde\nL\u03bde\nL + \u03bdo\nR\nand \u03bdo\nR have different variances depending on d, and are usually not independent of each other,\nthe sum cannot be modeled by a \u03c72\u2212distribution. However, we can compute the Fisher information\nfor the two implied binocular simple cells instead.13 It follows that\n\nL +\u03bde\nR(cid:105) = \u2212(cid:104)\u03bdo\n\nR)2 and (\u03bdo\n\nL + \u03bde\n\nL +\u03bdo\n\nL\u03bdo\n\n(cid:2)(\u03bde\n\nR)2(cid:3) \u223c \u03c72\n\nL + \u03bdo\n\npsimple\nodd\n\n(r; d) =\n\n1 and\n\n1\n\n2\u0393(1/2)(cid:112)1 + c(d)\n(cid:20)\n(cid:90) \u221e\n\ndr\n\n1\u221a\nr\n\n1\u221a\nr\n\nexp\n\nr\n\n4[1 + c(d)]\n\n(cid:26)\n\n\u2212\n\n\u2212 1\n2\n\nr\n\n4[1 + c(d)]\n\n(cid:21)2\n\n(cid:26)\n\n\u2212\n\nexp\n\n(cid:27)\n\nH(r).\n\n(cid:27)\n\nr\n\n4[1 + c(d)]\n\n1\n\n2[1 + c(d)]\n\nand14\n\nI simple\n\nodd\n\n(d) =\n\n=\n\nc(cid:48)(d)2\n\n(cid:112)1 + c(d)\n\n5\n\n0\n\n1\n\n2\u0393(1/2)\nc(cid:48)(d)2\n\n1\n2\n\n[1 + c(d)]2\n\nodd\n\n15 The dependence of I simple\non disparity is shown in Figure 4C (red dashed). Most of the Fisher\ninformation is located in the primary slope (compare Figure 4A) followed by secondary slope to\nits left. The reason for this is the strong boost Fisher information gets when responses are lowest.\nWe also see that the total Fisher information carried by a phase-disparity simple cell is signi\ufb01cantly\nhigher than that carried by a position-disparity simple cell (compare dashed red and blue lines)\nraising the question of what other advantages or trade-offs there are that make it bene\ufb01cial for the\nprimate brain to employ so many position-disparity ones. Intrinsic neuronal variability may provide\npart of the answer since the difference in Fisher information between both models decreases as\nintrinsic variability increases. Figure 4D shows the Fisher information after Gaussian noise has been\nadded to the monocular inputs. However, even in this high intrinsic noise regime (noise variance of\nthe same order as tuning curve amplitude) the model with phase disparity carries signi\ufb01cantly more\ntotal Fisher information.\n\n12What is outside of the scope of this paper but obvious from equation (7) is that Fisher information is\nmaximized when the denominator, or the tuning function is minimal. Within the context of the energy model,\nthis occurs for neither the position-disparity model, nor the classic phase-disparity one, but for a model where\nthe left and right RFs that are linearly combined, are inverted with respect to each other (i.e. phase-shifted by\n\u03c0). In that case a(d) is a Gabor function with phase \u03c0 and becomes zero at zero disparity such that the Fisher\ninformation diverges. Such neurons, called tuned-inhibitory (TI, [11]) make up a small minority of neurons in\nmonkey V1.\n\n13The energy model as presented thus far models the responses of binocular complex cells. Disparity-\nR)2 or\n\nselective simple cells are typically modeled by just one combination of left and right RFs (\u03bde\n(\u03bdo\n\nR)2, and not the entire quadrature pair.\n\nL + \u03bdo\n\nL + \u03bde\n\n\u22121(x/\u03b1 \u2212 1/2)2 exp(\u2212x/\u03b1) =\n\n\u221a\n\n\u221a\n\u03b1/2 for \u03b1 > 0.\n\n\u03c0\n\n14(cid:82) \u221e\n\n\u221a\nx\n\n0 dx\n\n15This derivation equally applies to the Fisher information of simple cells with position disparity by sub-\na(cid:48)(d)2\n[1+a(d)]2 . This function is shown in Figure 4C (blue\n\neven (d) = 1\n2\n\nstituting a(d) for c(d) and we obtain I simple\ndashed).\n\n7\n\n\f4 Discussion\n\nThe central idea of our paper is to evaluate the quality of a sensory code with respect to an inference\ntask by taking stimulus variability into account, in particular that induced by irrelevant stimulus\nattributes. By framing stimulus-induced nuisance variability as noise, we were able to employ the\nexisting framework of Fisher information for evaluating the standard model of early binocular pro-\ncessing with respect to inferring disparity from random dot stereograms.\nWe started by investigating the disparity-conditioned variability of the binocular response in the ab-\nsence of intrinsic neuronal noise. We found that the response distributions are far from Poisson or\nGaussian and \u2013 independent of stimulus disparity \u2013 are always peaked at zero (the mean response\nto uncorrelated images). The information contained in the correlations between left and right RF\noutputs are translated into a modulation of the neuron\u2019s mean \ufb01ring rate primarily by altering the\nskew of the response distribution. This is quite different from the case of intrinsic noise and has\nimplications for comparing different codes. It is noteworthy that these response distributions are\nentirely imposed by the sensory system \u2013 the combination of the structure of the external world with\nthe internal processing model. Unlike the case of intrinsic noise which is usually added ad-hoc after\nthe neuronal computation has been performed, in our case the computational model impacts the use-\nfulness of the code beyond the traditionally reported tuning functions. This property extends to the\ncase of population codes, the next step for future work. Of great importance for the performance of\npopulation codes are interneuronal correlations. Again, the noise correlations due to nuisance stim-\nulus parameters are a direct consequence of the processing model and the structure of the external\ninput.\nNext we compared the Fisher information available for our inference task at various stages of binoc-\nular processing. We computed the Fisher information available in the monocular inputs to binocular\nneurons in V1, after binocular combination and after the squaring nonlinearity required to translate\nbinocular correlations into mean \ufb01ring rate modulation. We \ufb01nd that despite the great stimulus vari-\nability, the total Fisher information available in the inputs diverges and is only bounded by intrinsic\nneuronal variability. The same is still true after binocular combination for one \ufb02avor of the model\nconsidered here \u2013 that employing phase disparity (or pairing unlike RFs in either eye), not the other\none (position disparity), which has lost most information after the initial combination. At this point,\nour new approach allows us to ask a normative question: In what way should the monocular inputs\nbe combined so as to lose a minimal amount of information about the relevant stimulus dimension?\nIs the combination proposed by the standard model to obtain even-symmetric tuning curves the only\none to do so or are they others that produce a different tuning curve, with a different response dis-\ntribution that is more suited to inferring depth? Conversely, we can compare our results for the\nmodel stages leading from simple to complex cells and compare them with the corresponding Fisher\ninformation computed from empirically observed distributions, to test our model assumptions.\nRecently, Fisher information has been criticized as a tool for comparing population codes ([3, 2]).\nWe note that our approach can be readily adapted to other measures like mutual information or\ntheir framework of neurometric function analysis to compare the performance of different codes in\na disparity discrimination task.\nAnother potentially promising avenue of future research would to investigate the effect of thresh-\nolding on inference performance. One reason that odd-symmetric tuning curves had higher Fisher\ninformation in the case we investigated was that odd-symmetric cells produce near-zero responses\nmore often in the context of the energy model. However, it is known from empirical observations\nthat \ufb01tting even-symmetric disparity tuning curves requires an additional thresholding output non-\nlinearity. It is unclear at this point to what extend such a change to the response distribution helps or\nhinders inference.\nAnd \ufb01nally, we suggest that considering the different shapes of response distributions induced by\nthe speci\ufb01cs of the sensory modality might have an impact on the discussion about probabilistic\npopulation codes ([7, 8] and references therein). Cue-integration, for instance, has usually been\nstudied under the assumption of Poisson-like response distributions, assumptions that do not appear\nto hold in the case of combining disparity cues from different parts of the visual \ufb01eld.\n\nAcknowledgments\n\nThis work has been supported by the Bernstein award to MB (BMBF; FKZ: 01GQ0601).\n\n8\n\n\fReferences\n[1] LF Abbott and P Dayan. The effect of correlated variability on the accuracy of a population code. Neural\n\nComput, 11(1):91\u2013101, 1999.\n\n[2] P Berens, S Gerwinn, A Ecker, and M Bethge. Neurometric function analysis of population codes. In\nY. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural\nInformation Processing Systems 22, pages 90\u201398. 2009.\n\n[3] M Bethge, D Rotermund, and K Pawelzik. Optimal short-term population coding: when \ufb01sher informa-\n\ntion fails. Neural Comput, 14(10):2317\u20132351, 2002.\n\n[4] C Blakemore and B Julesz. Stereoscopic depth aftereffect produced without monocular cues. Science,\n\n171(968):286\u2013288, 1971.\n\n[5] BG Cumming and GC DeAngelis. The physiology of stereopsis. Annu Rev Neurosci, 24:203\u2013238, 2001.\n[6] P Dayan and LF Abbott. Theoretical neuroscience: Computational and mathematical modeling of neural\n\nsystems. MIT Press, 2001.\n\n[7] J Fiser, P Berkes, G Orban, and M Lengyel. Statistically optimal perception and learning: from behavior\n\nto neural representations. Trends Cogn Sci, 14(3):119\u2013130, 2010.\n\n[8] WJ Ma, JM Beck, PE Latham, and A Pouget. Bayesian inference with probabilistic population codes.\n\nNat Neurosci, 9(11):1432\u20131438, 2006.\n\n[9] David Marr. Vision: A Computational Investigation into the Human Representation and Processing of\n\nVisual Information. Henry Holt and Co., Inc., New York, NY, USA, 1982.\n\n[10] I Ohzawa, GC DeAngelis, and RD Freeman. Stereoscopic depth discrimination in the visual cortex:\n\nneurons ideally suited as disparity detectors. Science, 249(4972):1037\u20131041, 1990.\n\n[11] GF Poggio and B Fischer. Binocular interaction and depth sensitivity in striate and prestriate cortex of\n\nbehaving rhesus monkey. J Neurophysiol, 40(6):1392\u20131405, 1977.\n\n[12] F. Rieke, D. Warland, R.R. van, Steveninck, and W. Bialek. Spikes: exploring the neural code. MIT Press,\n\nCambridge, MA, 1997.\n\n9\n\n\f", "award": [], "sourceid": 590, "authors": [{"given_name": "Haefner", "family_name": "Ralf", "institution": null}, {"given_name": "Matthias", "family_name": "Bethge", "institution": null}]}