{"title": "Implicit encoding of prior probabilities in optimal neural populations", "book": "Advances in Neural Information Processing Systems", "page_first": 658, "page_last": 666, "abstract": "Optimal coding provides a guiding principle for understanding the representation of sensory variables in neural populations. Here we consider the influence of a prior probability distribution over sensory variables on the optimal allocation of cells and spikes in a neural population. We model the spikes of each cell as samples from an independent Poisson process with rate governed by an associated tuning curve. For this response model, we approximate the Fisher information in terms of the density and amplitude of the tuning curves, under the assumption that tuning width varies inversely with cell density. We consider a family of objective functions based on the expected value, over the sensory prior, of a functional of the Fisher information. This family includes lower bounds on mutual information and perceptual discriminability as special cases. In all cases, we find a closed form expression for the optimum, in which the density and gain of the cells in the population are power law functions of the stimulus prior. This also implies a power law relationship between the prior and perceptual discriminability. We show preliminary evidence that the theory successfully predicts the relationship between empirically measured stimulus priors, physiologically measured neural response properties (cell density, tuning widths, and firing rates), and psychophysically measured discrimination thresholds.", "full_text": "Implicit encoding of prior probabilities\n\nin optimal neural populations\n\nDeep Ganguli and Eero P. Simoncelli\nHoward Hughes Medical Institute, and\n\nCenter for Neural Science\n\nNew York University\nNew York, NY 10003\n\n{dganguli,eero}@cns.nyu.edu\n\nOptimal coding provides a guiding principle for understanding the representation\nof sensory variables in neural populations. Here we consider the in\ufb02uence of a\nprior probability distribution over sensory variables on the optimal allocation of\nneurons and spikes in a population. We model the spikes of each cell as samples\nfrom an independent Poisson process with rate governed by an associated tuning\ncurve. For this response model, we approximate the Fisher information in terms\nof the density and amplitude of the tuning curves, under the assumption that tun-\ning width varies inversely with cell density. We consider a family of objective\nfunctions based on the expected value, over the sensory prior, of a functional of\nthe Fisher information. This family includes lower bounds on mutual information\nand perceptual discriminability as special cases. In all cases, we \ufb01nd a closed\nform expression for the optimum, in which the density and gain of the cells in\nthe population are power law functions of the stimulus prior. This also implies\na power law relationship between the prior and perceptual discriminability. We\nshow preliminary evidence that the theory successfully predicts the relationship\nbetween empirically measured stimulus priors, physiologically measured neural\nresponse properties (cell density, tuning widths, and \ufb01ring rates), and psychophys-\nically measured discrimination thresholds.\n\n1\n\nIntroduction\n\nMany bottom up theories of neural encoding posit that sensory systems are optimized to repre-\nsent sensory information, subject to limitations of noise and resources (e.g., number of neurons,\nmetabolic cost, wiring length). It is dif\ufb01cult to test this concept because optimization of any formu-\nlation that attempts to correctly incorporate all of the relevant ingredients is generally intractable. A\nsubstantial literature has considered population models in which each neuron\u2019s mean response to a\nscalar variable is characterized by a tuning curve [e.g., 1\u20136]. For these simpli\ufb01ed models, several\npapers have examined the optimization of Fisher information, as a bound on mean squared error\n[7\u201310]. In these results, the distribution of sensory variables is assumed to be uniform and the pop-\nulations are assumed to be homogeneous with regard to tuning curve shape, spacing, and amplitude.\n\nThe distribution of sensory variables encountered in the environment is often non-uniform, and it is\nthus of interest to understand how variations in probability affect the design of optimal populations.\nIt would seem natural that a neural system should devote more resources to regions of sensory space\nthat occur with higher probability, analogous to results in coding theory [11]. At the single neuron\nlevel, several publications describe solutions in which monotonic neural response functions allocate\ngreater dynamic range to higher probability stimuli [12\u201315]. At the population level, non-uniform\nallocations of neurons with identical tuning curves have been shown to be optimal for non-uniform\nstimulus distributions [16, 17].\n\n\fHere, we examine the in\ufb02uence of a sensory prior on the optimal allocation of neurons and spikes\nin a population, and the implications of this optimal allocation for subsequent perception. Given\na prior distribution over a scalar stimulus parameter, and a resource budget of N neurons with an\naverage of R spikes/sec for the entire population, we seek the optimal shapes, positions, and am-\nplitudes of tuning curves. We assume a population with independent Poisson spiking, and consider\na family of objective functions based on Fisher information. We then approximate the Fisher in-\nformation in terms of two continuous resource variables, the density and gain of the tuning curves.\nThis approximation allows us to obtain a closed form solution for the optimal population. For all\nobjective functions, we \ufb01nd that the optimal tuning curve properties (cell density, tuning width, and\ngain) are power-law functions of the stimulus prior, with exponents dependent on the speci\ufb01c choice\nof objective function. Through the Fisher information, we also derive a bound on perceptual dis-\ncriminability, again in the form a power-law of the stimulus prior. Thus, our framework provides\ndirect and experimentally testable links between sensory priors, properties of the neural representa-\ntion, and perceptual discriminability. We provide preliminary evidence that these relationships are\nsupported by experimental data.\n\n2 Encoding model\n\nWe assume a conventional model for a population of N neurons responding to a single scalar vari-\nable, s [1\u20136]. The number of spikes emitted (per unit time) by the nth neuron is a sample from\nan independent Poisson process, with mean rate determined by its tuning function, hn(s). The\nprobability density of the population response can be written as\n\np(~r|s) =\n\nhn(s)rn e\u2212hn(s)\n\nrn!\n\n.\n\nN\n\nYn=1\n\nWe also assume the total expected spike rate, R, of the population is \ufb01xed, which places a constraint\non the tuning curves:\n\nZ p(s)\n\nN\n\nXn=1\n\nhn(s) ds = R,\n\n(1)\n\nwhere p(s) is the probability distribution of stimuli in the environment. We refer to this as a sensory\nprior, in anticipation of its future use in Bayesian decoding of the population response.\n\n3 Objective function\n\nWe now ask: what is the best way to represent values drawn from p(s) given the limited resources\nof N neurons and R total spikes? To formulate a family of objective functions which depend on\nboth p(s), and the tuning curves, we \ufb01rst rely on Fisher information, If (s), which can be written as\na function of the tuning curves [1, 18]:\n\nIf (s) = \u2212Z p(~r|s)\n\n\u22022\n\u2202s2 log p(~r|s) d~r\n\n=\n\nN\n\nXn=1\n\nh\u20322\nn (s)\nhn(s)\n\n.\n\nThe Fisher information can be used to express lower bounds on mutual information [16], the vari-\nance of an unbiased estimator [18], and perceptual discriminability [19]. Speci\ufb01cally, the mutual\ninformation, I(~r; s), is bounded by:\n\nI(~r; s) \u2265 H(s) \u2212\n\n1\n\n2Z p(s) log(cid:18) 2\u03c0e\n\nIf (s)(cid:19) ds,\n\n(2)\n\nwhere H(s) is the entropy, or amount of information inherent in p(s), which is independent of the\nneural population. The Cramer-Rao inequality allows us to express the minimum expected squared\n\n2\n\n\fstimulus discriminability achievable by any decoder1:\n\n\u03b42 \u2265 \u22062Z p(s)\n\nIf (s)\n\nds.\n\n(3)\n\nThe constant \u2206 determines the performance level at threshold in a discrimination task.\nWe formulate a generalized objective function that includes the Fisher bounds on information and\ndiscriminability as special cases:\n\narg max\n\nhn(s) Z p(s) f  N\nXn=1\n\nh\u20322\n\nn (s)\n\nhn(s)! ds,\n\ns.t.\n\nZ p(s)\n\nN\n\nXn=1\n\nhn(s) ds = R,\n\n(4)\n\nwhere f (\u00b7) is either the natural logarithm, or a power function. When f (x) = log(x), optimizing\nEq. (4) is equivalent to maximizing the lower bound on mutual information given in Eq. (2). We\nrefer to this as the infomax objective function. Otherwise, we assume f (x) = x\u03b1, for some exponent\n\u03b1. Optimizing Eq. (4) with \u03b1 = \u22121 is equivalent to minimizing the squared discriminability bound\nexpressed in Eq. (3). We refer to this as the discrimax objective function.\n\n4 How to optimize?\n\nThe objective function expressed in Eq. (4) is dif\ufb01cult to optimize because it is non-convex. To\nmake the problem tractable, we \ufb01rst introduce a parametrization of the population in terms of cell\ndensity and gain. The cell density controls both the spacing and width of the tuning curves, and the\ngain controls their maximum average \ufb01ring rates. Second, we show that Fisher information can be\nclosely approximated as a continuous function of density and gain. Finally, re-writing the objective\nfunction and constraints in these terms allows us to obtain closed-form solutions for the optimal\ntuning curves.\n\n4.1 Density and gain for a homogeneous population\n\nIf p(s) is uniform, then by symmetry, the Fisher information for an optimal neural population should\nalso be uniform. We assume a convolutional population of tuning curves, evenly spaced on the unit\nlattice, such that they approximately \u201ctile\u201d the space:\n\nN\n\nWe also assume that this population has an approximately constant Fisher information:\n\nh(s \u2212 n) \u2248 1.\n\nXn=1\n\nIf (s) =\n\n=\n\nN\n\nN\n\nXn=1\nXn=1\n\nh\u20322(s \u2212 n)\nh(s \u2212 n)\n\n\u03c6(s \u2212 n) \u2248 Iconv.\n\n(5)\n\nThat is, we assume that the Fisher information curves for the individual neurons, \u03c6(s \u2212 n), also\ntile the stimulus space. The value of the constant, Iconv, is dependent on the details of the tuning\ncurve shape, h(s), which we leave unspeci\ufb01ed. As an example, Fig. 1(a-b) shows that the Fisher\ninformation for a convolutional population of Gaussian tuning curves, with appropriate width, is\napproximately constant.\n\nNow we introduce two scalar values, a gain (g), and a density (d), that affect the convolutional\npopulation as follows:\n\nhn(s) = g h(cid:16)d(s \u2212\n\nn\nd\n\n)(cid:17) .\n\n(6)\n\n1The conventional Cramer-Rao bound expresses the minimum mean squared error of any estimator, and in\ngeneral requires a correction for the estimator bias [18]. Here, we use it to bound the squared discriminability\nof the estimator, as expressed in the stimulus space, which is independent of bias [19].\n\n3\n\n\f(a)\n\n(c)\n\n(b)\n\ne\nt\na\nm\ni\nx\no\nr\np\np\na\n\nl\na\nu\nt\nc\na\n\nFisher \n\ninformation\n\nFiring rate\n\n8\n\n7\n\n6\n\n5\n\n4\n\n3\n\n2\n\n1\n\n(d)\n\ne\nt\na\nr\n \n\ng\nn\ni\nr\ni\nF\n\n(e)\n\n \nr\ne\nh\ns\ni\nF\n\nn\no\ni\nt\na\nm\nr\no\nf\nn\ni\n\napproximate\n\nactual\n\nFig. 1. Construction of a heterogeneous population of neurons. (a) Homogeneous population with\nGaussian tuning curves on the unit lattice. The tuning width of \u03c3 = 0.55 is chosen so that the curves\n(b) The Fisher information of the convolutional population\napproximately tile the stimulus space.\n(green) is approximately constant. (c) Inset shows d(s), the tuning curve density. The cumulative\nintegral of this density, D(s), alters the positions and widths of the tuning curves in the convolutional\npopulation. (d) The warped population, with tuning curve peaks (aligned with tick marks, at locations\nsn = D\u22121(n)), is scaled by the gain function, g(s) (blue). A single tuning curve is highlighted\n(red) to illustrate the effect of the warping and scaling operations. (e) The Fisher information of the\ninhomogeneous population is approximately proportional to d2(s)g(s).\n\nThe gain modulates the maximum average \ufb01ring rate of each neuron in the population. The density\ncontrols both the spacing and width of the tuning curves: as the density increases, the tuning curves\nbecome narrower, and are spaced closer together so as to maintain their tiling of stimulus space. The\neffect of these two parameters on Fisher information is:\n\nN (d)\n\nIf (s) = d2g\n\n\u03c6(ds \u2212 n)\n\nXn=1\n\n\u2248 d2g Iconv.\n\nThe second line follows from the assumption of Eq. (5), that the Fisher information of the convolu-\ntional population is approximately constant with respect to s.\nThe total resources, N and R, naturally constrain d and g, respectively. If the original (unit-spacing)\nconvolutional population is supported on the interval (0, Q) of the stimulus space, then the number\nof neurons in the modulated population must be N (d) = Qd to cover the same interval. Under\nthe assumption that the tuning curves tile the stimulus space, Eq. (1) implies that R = g for the\nmodulated population.\n\n4.2 Density and gain for a heterogeneous population\n\nIntuitively, if p(s) is non-uniform, the optimal Fisher information should also be non-uniform. This\ncan be achieved through inhomogeneities in either the tuning curve density or gain. We thus gener-\nalize density and gain to be continuous functions of the stimulus, d(s) and g(s), that warp and scale\nthe convolutional population:\n\nhn(s) = g(sn) h(D(s) \u2212 n).\n\n(7)\n\n4\n\n\fInfomax\n\nDiscrimax\n\nOptimized function:\n\nf (x) = log x\n\nf (x) = \u2212x\u22121\n\nd(s) N p(s)\n\nDensity (Tuning width)\u22121\nGain\nFisher information\nDiscriminability bound \u03b4min(s) \u221d p\u22121(s)\n\ng(s) R\nIf (s) \u221d RN 2p2(s) \u221d RN 2p\n\nRp\u2212\n\nN p\n\n1\n\n2 (s)\n\n1\n\n2 (s)\n\n1\n\n2 (s) \u221d RN 2p\n\n2\n\n1\u22123\u03b1 (s)\n\n\u221d p\u2212\n\n1\n\n4 (s)\n\n\u221d p\n\n1\n\n3\u03b1\u22121 (s)\n\nGeneral\nf (x) = \u2212x\u03b1, \u03b1 < 1\n3\n\nN p\n\n\u03b1\u22121\n\n3\u03b1\u22121 (s)\n\nRp\n\n2\u03b1\n\n1\u22123\u03b1 (s)\n\nTable 1. Optimal heterogeneous population properties, for objective functions speci\ufb01ed by Eq. (9).\n\nHere, D(s) =R s\n\n\u2212\u221e d(t)dt, the cumulative integral of d(s), warps the shape of the prototype tuning\ncurve. The value sn = D\u22121(n) represents the preferred stimulus value of the (warped) nth tuning\ncurve (Fig. 1(b-d)). Note that the warped population retains the tiling properties of the original\nconvolutional population. As in the uniform case, the density controls both the spacing and width\nof the tuning curves. This can be seen by rewriting Eq. (7) as a \ufb01rst-order Taylor expansion of D(s)\naround sn:\n\nwhich is a generalization of Eq. (6).\n\nhn(s) \u2248 g(sn) h(d(sn)(s \u2212 sn)),\n\nWe can now write the Fisher information of the heterogeneous population of neurons in Eq. (7) as\n\nN\n\nIf (s) =\n\nd2(s) g(sn) \u03c6(D(s) \u2212 n)\n\nXn=1\n\n\u2248 d2(s) g(s) Iconv.\n\n(8)\n\nIn addition to assuming that the Fisher information is approximately constant (Eq. (5)), we have\nalso assumed that g(s) is smooth relative to the width of \u03c6(D(s) \u2212 n) for all n, so that we can\napproximate g(sn) as g(s) and remove it from the sum. The end result is an approximation of\nFisher information in terms of the continuous parameterization of cell density and gain. As earlier,\nthe constant Iconv is determined by the precise shape of the tuning curves.\nAs in the homogeneous case, the global resource values N and R will place constraints on d(s)\nand g(s), respectively. In particular, we require that D(\u00b7) map the entire input space onto the range\n\nfact that the warped tuning curves sum to unity (before multiplication by the gain function) and use\n\n[1, N ], and thus D(\u221e) = N, or equivalently,R d(s) ds = N. To attain the proper rate, we use the\nEq. (1) to obtain the constraintR p(s)g(s) ds = R.\n\n4.3 Objective function and solution for a heterogeneous population\n\nApproximating Fisher information as proportional to squared density and gain allows us to re-write\nthe objective function and resource constraints of Eq. (4) as\n\narg max\n\nd(s),g(s)Z p(s) f(cid:0)d2(s) g(s)(cid:1) ds,\n\ns.t.\n\nZ d(s) ds = N,\n\nand\n\nZ p(s)g(s) ds = R.\n\n(9)\n\nA closed-form optimum of this objective function is easily determined by taking the gradient of the\nLagrangian, setting to zero, and solving the resulting system of equations. Solutions are provided in\nTable 1 for the infomax, discrimax, and general power cases.\n\nIn all cases, the solution speci\ufb01es a power-law relationship between the prior, and the density and\ngain of the tuning curves. In general, all solutions allocate more neurons, with correspondingly\nnarrower tuning curves, to higher-probability stimuli. In particular, the infomax solution allocates\nan approximately equal amount of probability mass to each neuron. The shape of the optimal gain\nfunction depends on the objective function: for \u03b1 < 0, neurons with lower \ufb01ring rates are used\nto represent stimuli with higher probabilities, and for \u03b1 > 0, neurons with higher \ufb01ring rates are\nused for stimuli with higher probabilities. Note also that the global resource values, N and R,\nenter only as scale factors on the overall solution, allowing us to easily test the validity of the\n\n5\n\n\f(b)\n\nPhysiology\n\n(a)\n\nEnvironment\n\ny\nt\ni\nl\ni\nb\na\nb\no\nr\nP\n\ns\nl\nl\ne\nC\n#\n\n \n\n15\n\n10\n\n5\n\n0\n\n)\ng\ne\nd\n(\n \n\nd\nl\no\nh\ns\ne\nr\nh\nT\n\n15\n\n10\n\n5\n\n0\n\n(c)\n\nPerception\n\n0\n\n45\n\n90\n\n135\n\n180\n\n0\n\n45\n\n90\n\n135\n\n180\n\n0\n\n45\n\n90\n\n135\n\n180\n\n(d)\n\nInfomax predictions\n\n(e)\n\nDiscrimax predictions\n\nOrientation\n\ny\nt\ni\nl\ni\nb\na\nb\no\nr\nP\n\n0\n\n45\n\n90\n\n135\n\n180\n\n0\n\n45\n\n90\n\n135\n\n180\n\nOrientation\n\nFig. 2. (a) Distribution of orientations averaged across three natural image databases [20\u201322]. (b)\nDensity, or total number of Macaque V1 cells tuned to each preferred orientation [23]. (c) Orientation\ndiscrimination thresholds averaged across four human subjects [24]. (d & e) Infomax and discrimax\npredictions of orientation distribution. Blue: prediction from cell density. Red: prediction from dis-\ncrimination thresholds. Predictions were made by exponentiating the raw data with the appropriate\nexponent from Table 1, then normalizing to integrate to one.\n\npredicted relationships on experimental data. In addition to power-law relationships between tuning\nproperties and sensory priors, our formulation offers a direct relationship between the sensory prior\nand perceptual discriminability. This can be obtained by substituting the optimal solutions for d(s)\nand g(s) into Eq. (8), and using the resulting Fisher information to bound the discriminability,\n\n\u03b4(s) \u2265 \u03b4min(s) \u2261 \u2206/pIf (s) [19]. The resulting expressions are provided in Table 1.\n\n5 Experimental evidence\n\nOur framework predicts a quantitative link between the sensory prior, physiological parameters (the\ndensity, tuning widths, and gain of cells), and psychophysically measured discrimination thresholds.\nWe obtained subsets of these quantities for two visual stimulus variables, orientation and spatial\nfrequency, both of believed to be encoded by cells in primary visual cortex (area V1). For each\nvariable, we use the infomax and discrimax solutions to convert the physiological and perceptual\nmeasurements, using the appropriate exponents from Table 1, into predictions of the stimulus prior\n\u02c6p(s). We then compare these predictions to the empirically measured prior p(s).\n\n5.1 Orientation\n\nWe estimated the prior distribution of orientations in the environment by averaging orientation statis-\ntics across three natural image databases. Two databases consist entirely of natural scenes [20, 21],\nand the third contains natural and manmade scenes [22]. Orientation statistics depend on scale, so\nwe measured statistics at a scale matching the psychophysical experiment from which we obtained\nperceptual data. The average distribution of orientations exhibits higher probability at the cardinal\norientations (vertical and horizontal) than at the oblique orientations (Fig. 2(a)). Measurements of\ncell density for a population of 79 orientation-tuned V1 cells in Macaque [23] show more cells tuned\nto the cardinal orientations than the oblique orientations (Fig. 2(b)). Finally, perceptual discrimi-\nnation thresholds, averaged across four human subjects [24] show a similar bias (Fig. 2(c)), with\nhumans better able to discriminate orientations near the cardinal directions.\n\nAll of the orientation data exhibit similar biases, but our theory makes precise and testable predic-\ntions about these relationships. If a neural population is designed to maximize information, then the\ncell density and inverse discrimination thresholds should match the stimulus prior, as expressed in\ninfomax column of Table 1. We normalize these predictions to integrate to one (since the theory\n\n6\n\n\f(a)\n\nEnvironment\n\ny\nt\ni\nl\ni\nb\na\nb\no\nr\np\n\n \n\ng\no\nL\n\n0\n\n5\n\n15\nSpatial Frequency (cpd)\n\n10\n\n(c)\n\nPerception\n\n(b)\n\nPhysiology\n\ndensity\n(# Cells)\n\ntuning width\n\n(cpd)\n\n100\n\n50\n\n0\n\n24\n\n12\n\n0\n\n)\nd\np\nc\n(\n \nd\nl\no\nh\ns\ne\nr\nh\nT\n\n3\n\n2\n\n1\n\n0\n\n0\n\n1\n\n2\n\n3\n\n4\n\n5\n\n6\n\n7\n\n0\n\n5\n\n10\n\n15\n\n(d)\n\nInfomax predictions\n\n(e)\n\nDiscrimax predictions\n\ny\nt\ni\nl\ni\nb\na\nb\no\nr\np\n\n \n\ng\no\nL\n\n0\n\n5\n\n15\nSpatial Frequency (cpd)\n\n10\n\n0\n\n5\n\n10\n\n15\n\nFig. 3. (a) Distribution of spatial frequencies computed across two natural image databases [20, 21].\n(b) Cell density as a function of preferred spatial frequency for a population of 317 V1 cells [25,\n28] Dark blue: average number of cells tuned to each spatial frequency. Light blue: average tuning\nwidth.\nthresholds obtained at\n10% contrast averaged across 3 human subjects [26]. Light red: thresholds obtained at 25% contrast\n(d & e) Infomax and discrimax predictions of spatial\naveraged across 7-13 human subjects [27].\nfrequency distribution. Blues: predictions from cell density and tuning widths. Reds: predictions from\ndiscrimination thresholds.\n\n(c) Average spatial frequency discrimination thresholds. Dark red:\n\nprovides only the shapes of the functions, up to unknown values of the resource variables N and\nR), and plot them against the measured prior (Fig. 2(d)). We see that the predictions arising from\ncell density and discrimination thresholds are consistent with one another, and both are consistent\nwith the stimulus prior. This is especially remarkable given that the measurements come from very\ndifferent domains (in the case of the perceptual and physiological data, different species). For the\ndiscrimax objective function, the exponents in the power-law relationships (expressed in Table 1)\nare too small, resulting in poor qualitative agreement between the stimulus prior and predictions\nfrom the physiology and perception (Fig. 2(e)). For example, predicting the prior from perceptual\ndata, under the discrimax objective function, requires exponentiating discrimination thresholds to\nthe fourth power, resulting in an over exaggeration of the cardinal bias.\n\n5.2 Spatial frequency\n\nWe obtained a prior distribution over spatial frequencies averaged across two natural image\ndatabases [20, 21]. For each image, we computed the magnitude spectrum, and averaged over ori-\nentation. We averaged these across images, and \ufb01t the result with a power law of exponent \u22121.3\n(Fig. 3(a)). We also obtained spatial frequency tuning properties for a population of 317 V1 cells\n[25]. On average, we see there are more cells, with correspondingly narrower tuning widths, tuned\nto low spatial frequencies (Fig. 3(b)). These data support the model assumption that tuning width\nis inversely proportional to cell density. We also obtained average discrimination thresholds for si-\nnusoidal gratings of different spatial frequencies from two studies (Fig. 3(c)). The gratings were\nshown at 10% contrast to 3 human subjects for one study [26], and 25% contrast for 7 \u2212 13 human\nsubjects for the other [27]. The thresholds show that, on average, humans are better at discriminating\nlow spatial frequencies.\n\nWe again test the infomax and discrimax solutions by comparing predicted distributions obtained\nfrom the physiological and perceptual data, to the measured prior. We normalize each prediction\nto integrate to the corresponding area under the prior. The infomax case shows striking agreement\nbetween the measured stimulus prior, and predictions based on the physiological and perceptual\nmeasurements (Fig. 3(d)). However, as in the orientation case, discrimax predictions are poor (Fig.\n3(e)), suggesting that information maximization provides a better optimality principle for explaining\nthe neural and perceptual encoding of spatial frequency than discrimination maximization.\n\n7\n\n\f6 Discussion\n\nWe have examined the in\ufb02uence sensory priors on the optimal allocation of neural resources, as\nwell as the in\ufb02uence of these optimized resources on subsequent perception. For a family of ob-\njective functions, we obtain closed-form solutions specifying power law relationships between the\nprobability distribution of a sensory variable encountered in the environment, the tuning properties\nof a population that encodes that variable, and the minimum perceptual discrimination thresholds\nachievable for that variable. We\u2019ve shown preliminary supportive evidence for these relationships\nfor two different perceptual attributes.\n\nOur analysis requires several approximations and assumptions in order to arrive at an analytical\nsolution. We \ufb01rst rely on lower bounds on mutual information and discriminability based on Fisher\ninformation. Fisher information is known to provide a poor bound on mutual information when\nthere are a small number of neurons, a short decoding time, or non-smooth tuning curves [16, 29].\nIt also provides a poor bound on supra-threshold discriminability [30, 31]. But note that we do not\nrequire the bounds on either information or discriminability to be tight, but rather that their optima\nbe close to that of their corresponding true objective functions. We also made several assumptions in\nderiving our results: (1) the tuning curves, h(D(s)\u2212 n), evenly tile the stimulus space; (2) the single\nneuron Fisher informations, \u03c6(D(s) \u2212 n), evenly tile the stimulus space; and (3) the gain function,\ng(s), varies slowly and smoothly over the width of \u03c6(D(s) \u2212 n). These assumptions allow us to\napproximate Fisher information in terms of cell density and gain (Fig. 1(e)), to express the resource\nconstraints in simple form, and to obtain a closed-form solution to the optimization problem.\n\nOur framework offers an important generalization of the population coding literature, allowing for\nnon-uniformity of sensory priors, and corresponding heterogeneity in tuning and gain properties.\nNevertheless, it suffers from many of the same simpli\ufb01cations found in previous literature. First,\nneural spike trains are not Poisson, and they are (at least in some cases) correlated [32]. Second,\ntuning curve encoding models only specify neural responses to single stimulus values. The model\nshould be generalized to handle arbitrary combinations of stimuli. And third, the response model\nshould be generalized to handle multi-dimensional sensory inputs. Each of these limitations offers\nan important opportunity for future work.\n\nFinally, our encoding model has direct implications for Bayesian decoding, a problem that has re-\nceived much attention in recent literature [e.g., 5, 6, 33\u201335]. A Bayesian decoder must have knowl-\nedge of prior probabilities, but it is unclear how such knowledge is obtained or represented in the\nbrain [34]. Previous studies assume that prior probabilities are either uniform [6], represented in\nthe spiking activity of a separate population of neurons [5], or represented (in sample form) in the\nspontaneous activity [35]. Our encoding formulation provides a mechanism whereby the prior is im-\nplicitly encoded in the density and gains of tuning curves, which presumably arise from the strength\nof synaptic connections. We are currently exploring the requirements for a decoder that can correctly\nutilize this form of embedded prior information to obtain Bayesian estimates of stimulus variables.\n\nReferences\n\n[1] HS Seung and H Sompolinsky. Simple models for reading neuronal population codes. Proc. Natl. Acad.\n\nSci. U.S.A., 90:10749\u201310753, Nov 1993.\n\n[2] RS Zemel, P Dayan, and A Pouget. Probabilistic interpretation of population codes. Neural Comput,\n\n10(2):403\u2013430, Feb 1998.\n\n[3] A Pouget, P Dayan, and RS Zemel. Inference and computation with population codes. Annu Rev Neurosci,\n\n26:381\u2013410, 2003.\n\n[4] TD Sanger. Neural population codes. Curr Opin Neurobiol, 13(2):238\u2013249, Apr 2003.\n[5] WJ Ma, JM Beck, PE Latham, and A Pouget. Bayesian inference with probabilistic population codes.\n\nNat Neurosci, 9(11):1432\u20131438, Nov 2006.\n\n[6] M Jazayeri and JA Movshon. Optimal representation of sensory information by neural populations. Nat.\n\nNeurosci., 9:690\u2013696, May 2006.\n\n[7] K Zhang and TJ Sejnowski. Neuronal tuning: To sharpen or broaden? Neural Comput, 11(1):75\u201384, Jan\n\n1999.\n\n[8] A Pouget, S Deneve, JC Ducom, and PE Latham. Narrow versus wide tuning curves: What\u2019s best for a\n\npopulation code? Neural Comput, 11(1):85\u201390, Jan 1999.\n\n8\n\n\f[9] WM Brown and A B\u00a8acker. Optimal neuronal tuning for \ufb01nite stimulus spaces. Neural Comput,\n\n18(7):1511\u20131526, Jul 2006.\n\n[10] MA Montemurro and S Panzeri. Optimal tuning widths in population coding of periodic variables. Neural\n\nComput, 18(7):1555\u20131576, Jul 2006.\n\n[11] A Gersho and RM Gray. Vector quantization and signal compression. Kluwer Academic Publishers,\n\nNorwell, MA, 1991.\n\n[12] S Laughlin. A simple coding procedure enhances a neuron\u2019s information capacity. Z. Naturforschung,\n\n36c:910\u2013912, 1981.\n\n[13] JP Nadal and N Parga. Nonlinear neurons in the low-noise limit: a factorial code maximizes information\n\ntransfer. Network: Computation in Neural Systems, 5:565\u2013581(17), 1994.\n\n[14] T von der Twer and DI MacLeod. Optimal nonlinear codes for the perception of natural colours. Network,\n\n12(3):395\u2013407, Aug 2001.\n\n[15] MD McDonnell and NG Stocks. Maximally informative stimuli and tuning curves for sigmoidal rate-\n\ncoding neurons and populations. Phys Rev Lett, 101:58103\u201358107, 2008.\n\n[16] N Brunel and JP Nadal. Mutual information, Fisher information, and population coding. Neural Comput,\n\n10(7):1731\u20131757, Oct 1998.\n\n[17] NS Harper and D McAlpine. Optimal neural population coding of an auditory spatial cue. Nature,\n\n430(7000):682\u2013686, Aug 2004.\n\n[18] D Cox and D Hinkley. Theoretical statistics. London: Chapman and Hall., 1974.\n[19] P Seri\u00b4es, AA Stocker, and EP Simoncelli. Is the homunculus \u201daware\u201d of sensory adaptation? Neural\n\nComput, 21(12):3271\u20133304, Dec 2009.\n\n[20] JH van Hateren and A van der Schaaf. Independent component \ufb01lters of natural images compared with\n\nsimple cells in primary visual cortex. Proc Biol Sci, 265(1394):359\u2013366, Mar 1998.\n\n[21] E Doi, T Inui, TW Lee, T Wachtler, and TJ Sejnowski. Spatiochromatic receptive \ufb01eld properties de-\nrived from information-theoretic analyses of cone mosaic responses to natural scenes. Neural Comput,\n15(2):397\u2013417, Feb 2003.\n\n[22] A Olmos and FAA Kingdom. McGill calibrated image database, http://tabby.vision.mcgill.ca, 2004.\n[23] RJ Mans\ufb01eld. Neural basis of orientation perception in primate vision. Science, 186(4169):1133\u20131135,\n\nDec 1974.\n\n[24] AR Girshick, MS Landy, and EP Simoncelli. Bayesian line orientation perception: Human prior expecta-\n\ntions match natural image statistics. In Frontiers in Systems Neuroscience (CoSyNe)., 2010.\n\n[25] JR Cavanaugh, W Bair, and JA Movshon. Selectivity and spatial distribution of signals from the receptive\n\n\ufb01eld surround in macaque v1 neurons. J Neurophysiol, 88(5):2547\u20132556, Nov 2002.\n\n[26] T Caelli, H Brettel, I Rentschler, and R Hilz. Discrimination thresholds in the two-dimensional spatial\n\nfrequency domain. Vision Res, 23(2):129\u2013133, 1983.\n\n[27] D Regan, S Bartol, TJ Murray, and KI Beverley. Spatial frequency discrimination in normal vision and in\n\npatients with multiple sclerosis. Brain, 105 (Pt 4):735\u2013754, Dec 1982.\n\n[28] JR Cavanaugh, W Bair, and JA Movshon. Nature and interaction of signals from the receptive \ufb01eld center\n\nand surround in macaque v1 neurons. J Neurophysiol, 88(5):2530\u20132546, Nov 2002.\n\n[29] M Bethge, D Rotermund, and K Pawelzik. Optimal short-term population coding: when \ufb01sher informa-\n\ntion fails. Neural Comput, 14(10):2317\u20132351, Oct 2002.\n\n[30] M Shamir and H Sompolinsky. Implications of neuronal diversity on population coding. Neural Comput,\n\n18(8):1951\u20131986, Aug 2006.\n\n[31] P Berens, S Gerwinn, A Ecker, and M Bethge. Neurometric function analysis of population codes. In\n\nAdvances in Neural Information Processing Systems 22, pages 90\u201398, 2009.\n\n[32] E Zohary, MN Shadlen, and WT Newsome. Correlated neuronal discharge rate and its implications for\n\npsychophysical performance. Nature, 370(6485):140\u2013143, Jul 1994.\n\n[33] DC Knill and A Pouget. The bayesian brain: the role of uncertainty in neural coding and computation.\n\nTrends Neurosci, 27(12):712\u2013719, Dec 2004.\n\n[34] EP Simoncelli. Optimal estimation in sensory systems. In M Gazzaniga, editor, The Cognitive Neuro-\n\nsciences, IV, chapter 36, pages 525\u2013535. MIT Press, Oct 2009.\n\n[35] J Fiser, P Berkes, G Orb\u00b4an, and M Lengyel. Statistically optimal perception and learning: from behavior\n\nto neural representations. Trends Cogn Sci, 14(3):119\u2013130, Mar 2010.\n\n9\n\n\f", "award": [], "sourceid": 861, "authors": [{"given_name": "Deep", "family_name": "Ganguli", "institution": null}, {"given_name": "Eero", "family_name": "Simoncelli", "institution": null}]}