{"title": "Power-law efficient neural codes provide general link between perceptual bias and discriminability", "book": "Advances in Neural Information Processing Systems", "page_first": 5071, "page_last": 5080, "abstract": "Recent work in theoretical neuroscience has shown that information-theoretic \"efficient\" neural codes, which allocate neural resources to maximize the mutual information between stimuli and neural responses, give rise to a lawful relationship between perceptual bias and discriminability that is observed across a wide variety of psychophysical tasks in human observers (Wei & Stocker 2017). Here we generalize these results to show that the same law arises under a much larger family of optimal neural codes, introducing a unifying framework that we call power-law efficient coding. Specifically, we show that the same lawful relationship between bias and discriminability arises whenever Fisher information is allocated proportional to any power of the prior distribution. This family includes neural codes that are optimal for minimizing Lp error for any p, indicating that the lawful relationship observed in human psychophysical data does not require information-theoretically optimal neural codes. Furthermore, we derive the exact constant of proportionality governing the relationship between bias and discriminability for different power laws (which includes information-theoretically optimal codes, where the power is 2, and so-called discrimax codes, where power is 1/2), and different choices of optimal decoder. As a bonus, our framework provides new insights into \"anti-Bayesian\" perceptual biases, in which percepts are biased away from the center of mass of the prior. We derive an explicit formula that clarifies precisely which combinations of neural encoder and decoder can give rise to such biases.", "full_text": "Power-law ef\ufb01cient neural codes provide general link\n\nbetween perceptual bias and discriminability\n\nMichael J. Morais & Jonathan W. Pillow\n\nPrinceton Neuroscience Institute & Department of Psychology\n\nPrinceton University\n\nmjmorais, pillow@princeton.edu\n\nAbstract\n\nRecent work in theoretical neuroscience has shown that ef\ufb01cient neural codes, which\nallocate neural resources to maximize the mutual information between stimuli and neural\nresponses, give rise to a lawful relationship between perceptual bias and discriminability\nin psychophysical measurements (Wei & Stocker 2017, [1]). Here we generalize these\nresults to show that the same law arises under a much larger family of optimal neural codes,\nwhich we call power-law ef\ufb01cient codes. These codes provide a unifying framework for\nunderstanding the relationship between perceptual bias and discriminability, and how it\ndepends on the allocation of neural resources. Speci\ufb01cally, we show that the same lawful\nrelationship between bias and discriminability arises whenever Fisher information is allo-\ncated proportional to any power of the prior distribution. This family includes neural codes\nthat are optimal for minimizing Lp error for any p, indicating that the lawful relationship\nobserved in human psychophysical data does not require information-theoretically optimal\nneural codes. Furthermore, we derive the exact constant of proportionality governing the\nrelationship between bias and discriminability for different choices of power law expo-\nnent q, which includes information-theoretic (q = 2) as well as \u201cdiscrimax\u201d (q = 1/2)\nneural codes, and different choices of decoder. As a bonus, our framework provides new\ninsights into \u201canti-Bayesian\u201d perceptual biases, in which percepts are biased away from\nthe center of mass of the prior. We derive an explicit formula that clari\ufb01es precisely which\ncombinations of neural encoder and decoder can give rise to such biases.\n\n1\n\nIntroduction\n\nThere are relatively few general laws governing perceptual inference, the two most prominent being\nthe Weber-Fechner law [2] and Stevens\u2019 law [3]. Recently, Wei and Stocker [1] proposed a new\nperceptual law governing the relationship between perceptual bias and discriminability, and showed\nthat it holds across a wide variety of psychophysical tasks in human observers.\nPerceptual bias, b(x) = E[\u02c6x|x]  x, is the difference between the average stimulus estimate \u02c6x and\nits true value x. Perceptual discriminability D(x) characterizes the sensitivity with which stimuli\nclose to x can be discriminated, equivalently the just-noticable difference (JND); this is formalized as\nthe stimulus increment D(x) such that the stimuli x + \u2318D(x) and x  (1  \u2318)D(x) (for \u2318 between\n0 and 1) can be correctly distinguished with probability  , for some value of . Note that by\nthis de\ufb01nition, lower discriminability D(x) implies higher sensitivity to small changes in x, that is,\nimproved ability to discriminate.\nThe law proposed by Wei and Stocker asserts that bias and discriminability are related according to:\n\n(1)\nwhere the right-hand-side is the derivative with respect to x of the discriminability squared. The\nrelationship is backed by remarkably diverse experimental support, crossing sensory modalities,\n\nb(x) /\n\nd\ndx\n\nD(x)2\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fFigure 1: (Left) Schematic of Bayesian observer model under power-law ef\ufb01cient coding. On\neach trial, a stimulus x\u21e4 is sampled from the prior distribution p(x), and encoded into a neural\nInference involves computing the\nresponse y\u21e4 according to the encoding distribution p(y|x\u21e4).\nposterior p(x|y\u21e4) / p(y\u21e4|x)p(x), and the optimal point estimate \u02c6x minimizes the expected loss\nEp(x|y\u21e4)[L(\u02c6x, x)]. Power-law ef\ufb01cient coding stipulates that the encoding distribution p(y|x) has\nFisher information proportional to p(x)q for some power q. Thus the prior in\ufb02uences both encoding\n(via the Fisher information) and decoding (via its in\ufb02uence on the posterior). (Right) Intuitive example\nof bias and discriminability: adjusting a crooked picture frame. The stimulus x represents the angle\noff of vertical. Discriminability D(x) measures the minimum adjustment needed for the observer to\ndetect that it became better (or worse). Bias b(x) measures the offset of the estimated angle \u02c6x from\nthe true angle, in this case the overestimation of the crookedness. Adapted with edits from [1].\n\nstimulus statistics, and even task designs. At the heart of this experiment-unifying result is the\nBayesian observer model, \ufb02exibly instantiating perception as Bayesian inference in an encoding and\ndecoding cascade with a structure optimized to statistics in the natural environment [4, 5].\nWei and Stocker derived their law under the assumption of an information-theoretically optimal\nneural code, which previous work has shown to hold when Fisher information J(x) is proportional\nto p(x)2, the square of the prior distribution [6\u20138]. A critical follow-up question is whether this\ncondition is necessary for the emergence of the perceptual law. Does the perceptual law require\ninformation-theoretically optimal neural coding, or does the same bias-disriminability relationship\narise from other families of (non-information-theoretic) optimal codes? Here we provide a de\ufb01nitive\nanswer to this question. We use a Bayesian observer model to generalize the Wei-Stocker law beyond\ninformation-theoretically optimal neural codes to a family that we call power-law ef\ufb01cient codes.\nThese codes are characterized by a power-law relationship between Fisher information and prior,\nJ(x) / p(x)q, for any exponent q > 0. Critically, we show that this family replicates all key results\nin the original Wei and Stocker model.\nWe \ufb01rst review the derivation of the Wei & Stocker result governing the relationship between bias and\ndiscriminability (Section 2). We then develop a generic variational objective for power-law ef\ufb01cient\ncoding that reveals a many-to-one mapping from objective to resultant optimal code (Section 3). We\nuse this objective to derive a nonlinear relationship between bias and discriminability that, in the limit\nof high signal-to-noise ratio (SNR), reproduces the Wei & Stocker result for all power-law ef\ufb01cient\ncodes, with an analytic expression for the constant of proporationality (Section 4). In simulations, we\nexplore a range of SNRs and power-law ef\ufb01cient codes to verify these results, and examine a variety\nof decoders including posterior mode, median, and mean estimators (Section 5), demonstrating the\nuniversality of the bias-discriminability relationship across a broad space of models.\n\n2 The Wei & Stocker Law\n\nThe perceptual law proposed by Wei and Stocker can be seen to arise if perceptual judgments arise\nfrom a Bayesian ideal observer model with an appropriate allocation of neural resources. Perceptual\ninference in the Bayesian observer model (Fig. 1) consists of two stages: (1) encoding, in which\nan external stimulus x is mapped to a noisy internal representation y according to some encoding\ndistribution p(y|x); and (2) decoding, in which the internal representation y is converted to a point\nestimate \u02c6x using the information available in the posterior distribution,\n(2)\nwhich (according to Bayes\u2019 rule) is proportional to the product of p(y|x), known as the likelihoood\nwhen considered as a function of x, and a prior distribution p(x), which re\ufb02ects the environmental\n\np(x|y) / p(y|x)p(x),\n\n2\n\n\fFigure 2: The high-SNR regime within which the bias-discriminability relationship linearizes, under\n(A) Schematic illustration of how prior (top) relates to\nthe same sample prior as in Figure 1.\ndiscriminability (middle) and bias (bottom). (B-C) Increasing SNR k narrows the likelihood function\n(orange) and posterior (gray) relative to the prior (black), and makes the posterior more Gaussian.\n(D) The bias-discriminability relationship has arbitrary curvature at low-SNR, but converges to a line\nwith known slope in the high-SNR limit.\n\nstimulus statistics. Technically, the Bayes estimate is one that minimizes an expected loss under the\n\nposterior: \u02c6xBayes = arg min\u02c6xR dx p(x|y)L(x, \u02c6x), for some choice of loss function (e.g., L(x, \u02c6x) =\n(x  \u02c6x)2, which produces the \u201cBayes least squares estimator\u201d).\nOptimizing the encoding stage of such a model involves specifying the encoding distribution p(y|x).\nIntuitively, a good encoder is one that allocates neural resources such that stimuli that are common\nunder the prior p(x) are encoded more faithfully than stimuli that are uncommon under the prior.\nRecent work from several groups [6\u20139] has shown that his allocation problem can be addressed\ntractably in the high-SNR regime using Fisher Information, which quanti\ufb01es the local curvature of\nthe log-likelihood at x:\n\nJ(x) = Ey|xh  @2\n\n@x2 log p(y | x)i.\n\nIn the high-SNR regime, Fisher information provides a well-known approximation to the mutual\ninformation between stimulus and response: I(x, y) \u21e1 1\ntionship arises from the fact that asymptotically, the maximum likelihood estimate \u02c6x behaves like\na Gaussian random variable with variance 2 = 1/J(x) [9, 7, 10]. This relationship holds only in\nthe high-SNR limit, which is also pivotal to the perceptual law. Previous work has shown that the\nallocation of Fisher information that maximizes mutual information between x and y is proportional\nto the square of the prior, such that\n\n2R dx p(x) log J(x) + const. This rela-\n\n(3)\n\n(4)\n\nJ(x) / p(x)2.\n\nThe perceptual law of Wei & Stocker can be obtained by combining this formula with two other\nexisting results relating Fisher information to bias and discriminability. First, Series, Stocker &\nSimoncelli 2009 [11] showed that Fisher information placed a bound on discriminability. In the high\n\nSNR regime, this bound can be made tight resulting in the identity, D(x) / 1/pJ(x), where the\nconstant of proportionality depends on the desired threshold performance (e.g., 1 if the threshold\n \u21e1 76%). Second, the bias of a Bayesian ideal observer was shown in [8, 1] to relate to the prior\ndistribution via the relationship b(x) / d\nCombining these three proportionalities, we recover the perceptual law proposed by Wei & Stocker:\n\np(x)2 .\n\n1\n\ndx\n\nb(x)\n\n[1, 8]\n\n/\n\nd\ndx\n\n1\n\np(x)2\n\n[8]\n\n/\n\nd\ndx\n\n1\n\nJ(x)\n\n[11]\n\n/\n\nd\ndx\n\nD(x)2.\n\n(5)\n\nFigure 2A illustrates the relationship between these quantities for a simulated example, highlighting\nits dependence on the high-SNR limit. In this paper, we will show that the condition J(x) / p(x)2\nis stronger than necessary, and that the same perceptual law arises from any allocation of Fisher\ninformation proportional to a power of the prior distribution, that is, J(x) / p(x)q for any q > 0.\n\n3\n\nIncreasing SNRABSNR = 70SNR = 270SNR = 1040SNR = 4000SNR > 104Increasing SNRCDiscriminabilityBiasPriorDiscriminabilityBias\f(6)\n\n(7)\n\nBefore showing this result, we \ufb01rst revisit the normative setting in which such power-law allocations\nof Fisher information are optimal.\n\n3 Power-law ef\ufb01cient coding\n\nWe \ufb01rst show from where this power-law relationship between Fisher information and prior can\nemerge in an ef\ufb01cient neural code, and what factors determine the choice of power q. Previous work\non information-maximizing or \u201cinfomax\u201d codes [1, 8] has started from the following constrained\noptimization problem:\n\narg max\n\nJ(x)\n\nZ dx p(x) log J(x)\n\nsubject to C(x) =Z dxpJ(x) \uf8ff c,\n\nwhere log J(x) provides a well-known approximation to mutual information (up to an additive\nconstant) as described above. Solving for the optimal Fisher information J(x) using variational\ncalculus and Lagrange multipliers produces (eq. 4) with the equality J(x) = c2 p(x)2.\nWe can consider a more general method for de\ufb01ning normatively optimal codes by investigating\nFisher information allocated according to\n\narg max\n\nJ(x) Z dx p(x)J(x)\u21b5 subject to C(x) =Z dx J(x) \uf8ff c\n\nwith parameters \u21b5  0 de\ufb01ning the coding objective and > 0 specifying a resource constraint.\nSeveral canonical normatively optimal coding frameworks emerge from speci\ufb01c settings of the\nparameter \u21b5, independent of the value of :\n\n1. In the limit \u21b5 ! 0, this is equivalent to maximizing mutual information, since log J(x) =\n\nlim\u21b5!0\n\n\u21b5\n\nJ(x)\u21b51\n\n[12].\n\n2. If \u21b5 = 1, corresponds to minimizing the L2 reconstruction error, sometimes called \u201cdis-\n\ncrimax\u201d [6, 7] because it also optimizes squared discriminability.\n\n3. For the the general case \u21b5 = p/2, for any p > 0, this optimization corresponds to minimizing\n\nthe Lp reconstruction error under the approximation Ex,y\u21e3|\u02c6x  x|p\u2318 \u21e1 ExJ(x)p/2,\n\n[12].\n\nHere we show that this third relationship arises under a more general setting. We prove a novel bound\non the mean Lp error of any estimator for any level of SNR (see Supplemental Materials for proof,\nwhich builds on results from [13, 14]).\nTheorem (Generalized Bayesian Cramer-Rao bound for Lp error). For any point estimator \u02c6x of x,\nthe mean Lp error averaged over x \u21e0 p(x), y|x \u21e0 p(y|x), is bounded by\nZ dx p(x)J(x)p/2\n\nZZ dxdy p(y, x)(\u02c6x(y)  x)\n\nfor any p > 0, where J(x) is the Fisher Information at x.\n\n(8)\n\np\n\nThus, the objective given in (eq. 7) captures a wide range of optimal neural codes via different settings\nof \u21b5, including but not limited to classic ef\ufb01cient coding. We can solve this objective for any value\nof coding parameter \u21b5 and constraint parameter > 0 to obtain the optimal allocation of Fisher\ninformation. In all cases, the optimal Fisher information is proportional to the prior distribution raised\nto a power, which we therefore refer to as power-law ef\ufb01cient codes:\n\nJopt(x) = c1/ p(x)\n\nR dx p(x)!1/\n\n, k p(x)q,\n\n(9)\n\nwhere  = /( + \u21b5) and exponent q = 1/( + \u21b5). (see Supplemental Materials for derivation).\nThe normalized power function of the prior in parentheses is known as the escort distribution with\nparameter  [15]. Escort distributions arise naturally in power-law generalizations of logarithmic\nquantities such as mutual information, and could offer a reinterpretation of ef\ufb01cient coding and neural\n\n4\n\n\fcoding more generally in terms of key theories such as maximum entropy, source coding, and Fisher\ninformation in generalized geometries [16, 17]. Here, we focus on the right-most expression, which\ncharacterizes a power-law ef\ufb01cient code in terms of the power q and constant of proporationality\n\nk = c1/(R dx p(x))1/. One interesting feature of the power-law ef\ufb01cient coding framework is\n\nthat the exponent q, which determines how Fisher information is allocated relative to the prior, depends\non both the coding parameter \u21b5 and the constraint parameter  via the relationship q = 1/( + \u21b5).\nThis implies that the optimal allocation of Fisher information is multiply determined, and reveals an\nambiguity between coding desideratum and constraint in any optimal code.\nIn the particular case of infomax coding, where \u21b5 = 0, we obtain q = 1/, meaning that the power\nlaw exponent q is determined entirely by the constraint, and the escort parameter  equals 1. Previous\nwork [7, 8, 12], therefore, could be interpreted to be implicitly or explicitly forcing the choice of\n = 1/2. Any power-law ef\ufb01cient code with J(x) = kp(x)q could be putatively \u201cinfomax\u201d if we\nde\ufb01ned the constraint such that  = 1/q. For example, the so-called discrimax encoder developed\nin [7] in which J(x) / p(x)1/2 could result from an infomax objective function (\u21b5 = 0) if we only\nset the constraint  = 2. Rather than highlighting a pitfall of our procedure, this ambiguity instead\nhighlights (i) the universality of the power-law generalization we present here, and (ii) the need to\nconsider how other features of the observer model could further constrain the encoder to a uniquely\ninfomax code.\n\n4 Deriving linear and nonlinear bias-discriminability relationships\n\nNext, we wish to go beyond proportionality and determine the precise relationship between bias\nand discriminability under the power-law ef\ufb01cient coding framework described above. However,\nany optimization of Fisher information, including ours, doesn\u2019t prescribe a method for selecting a\nparametric encoding distribution p(y | x) associated with a particular power-law ef\ufb01cient code, that\nis, a distribution with Fisher information allocated according to J(x) = k p(x)q. For simplicity, we\ntherefore consider a power-law ef\ufb01cient code that is parametrized as Gaussian in y with mean x:\n\np(y | x) = N\u21e3x,\n\n1\n\nkp(x)q\u2318 =r kp(x)q\n\n2\u21e1\n\nexp\u21e3 \n\nkp(x)q\n\n2\n\n(y  x)2\u2318,\n\n(10)\n\nand we allocate Fisher information using a stimulus-dependent variance 2 = 1/kp(x)q. This is the\nonly con\ufb01guration with that allocates Fisher information appropriately and is also is Gaussian in y.\nThe parametrization of this encoder differs from that used by Wei and Stocker [1, 8], but critically\nhas the same Fisher information. We can show that all key analytical results continue to hold in\ntheir parametrization when extended to power-law ef\ufb01cient codes, and that we ameliorate several\nissues in their models (see Supplemental Materials for comparisons and proofs). It also replicates\nthe key results obtained with Wei and Stocker\u2019s parametrization, namely repulsive \"anti-Bayesian\"\nbiases, in which the average Bayes least squares estimate is biased away from prior relative to the\ntrue stimulus [8, 18]. But we prefer this parametrization for its simplicity and interpretability in terms\nof its parameters k and q.\nAt the decoding stage, Bayesian inference involves computing a posterior distribution over stimuli x,\nusing the encoding distribution (eq. 10) as the likelihood:\n\np(x | y) =\n\np(y | x)p(x)\n\np(y)\n\n=\n\np(x)\n\np(y)r kp(x)q\n\n2\u21e1\n\nexp\u21e3 \n\nkp(x)q\n\n2\n\n(y  x)2\u2318.\n\n(11)\n\nIn the high-SNR limit, the likelihood narrows and the log-prior can be well-approximated with a\nquadratic about the true stimulus x0, such that\n\nlog p(x) \u21e1 a0 + a1(x  x0) + 1\n\n2 a2(x  x0)2\n\nwhere the coef\ufb01cients a0, a1, and a2 are implicitly functions of x0. For the MAP estimator \u02c6xM AP ,\nthe bias in response to the stimulus at x = x0 can be expressed in this limit as (see Supplemental\nMaterials for proof)\n\nb(x) =\n\n1\nd0\n\n2\n\n2q\n\n (2+q)\n1 \u21e3qa1  a2\n\na1\u2318 (2+q)\n\n2q\n\nd\ndx D(x)2\nd\ndx D(x)2\n\n2\n\n1\nd0\n\n5\n\n,\n\n(12)\n\n\fwhere d0 = p2Z() is the d-prime statistic for a \ufb01xed performance , and Z(\u00b7) is the inverse normal\n\nCDF. We refer to this as our nonlinear relationship because it expresses bias b(x) as a nonlinear\nfunction of the squared discriminability D(x)2. This relationship makes testable nonlinear predictions\nbetween bias and discriminability that depend on the shape of the prior at each value of x through the\nlocal prior curvature parameters a1 and a2.\nWe recover a linear relationship between bias and discriminability in the higher-SNR limit when\n)eqa0| for all x0.\n| d\ndx D(x)2|\u2327|\nThis speci\ufb01cation of the high-SNR regime reveals that the likelihood must be so sharp around the\nstimulus that the prior, by comparison, becomes so broad that it is nearly \ufb02at. When satis\ufb01ed, the\n\ufb01nal result is the following linear relationship between bias and discriminability:\n\n)|1, satis\ufb01ed if the SNR k | (2+q)\n\n2 (qa1  a2\n\n2q (qa1  a2\n\na1\n\n1\nd0\n\n(2+q)\n\n2q\n\na1\n\nb(x) = \n\n(2 + q)\n\n2q\n\n1\nd0\n\n2! d\n\ndx\n\nD(x)2,\n\n(13)\n\nwhich indicates a negative constant of proporationality for all q. There is no contribution of a1 or a2\nto the coef\ufb01cient of proportionality; only q matters. Thus, we con\ufb01rm that for power-law ef\ufb01cient\ncodes generally, the Wei-Stocker law b(x) / d\n5 Simulating the model under different SNRs and power-law ef\ufb01cient codes\n\ndx D(x)2 holds in the limit of high SNR for all x.\n\nWe used simulated data to test our derived nonlinear and linear relationships between bias and\ndiscriminability (eqs. 12 & 13). We restricted these simulations to the high-SNR regimes in which\nthe analytical predictions provide accurate descriptions of the simulated data, and examined the\nqualitative differences that emerge for different powers of the power-law ef\ufb01cient code. We consider\na sweep of both of these parameters, k and q, under different decoder loss functions, which yield\ndifferent Bayesian estimators with very different implications for the resulting bias.\nIn all simulations, we propagate each stimulus x \u21e0 p(x) on a \ufb01nely tiled grid through a Bayesian\nobserver model numerically, computing a posterior p(x|y) / p(x)N (y; x, kp(x)q) for a power-law\nef\ufb01cient code under many powers q and SNRs k, and for each computed the Bayesian estimators\nassociated with various loss functions of interest. We repeated this procedure for a large number of\nrandom smooth priors. The bias-discriminability relationship will be most clearly observed if our\ndata can tile the space of discriminability and bias, achieved if the underlying priors are maximally\ndiverse and rich in curvature. As such, we draw random priors as exponentiated draws from Gaussian\nprocesses on [\u21e1, \u21e1 ], according to\n\np(x) = 1\n\nZ exp(f ), where f \u21e0GP (0, K)\nKij = \u21e2 exp 1\n\n2`2kxi  xjk2\n\nfor Z as a normalizing constant, and K the radial basis function kernel wherein\n\n(15)\nwith magnitude \u21e2 = 1 and lengthscale ` = 0.75, selected such that a typical prior was roughly\nbimodal. In this way, the vector elements are arti\ufb01cally ordered on a line to enforce smoothness. To\nprevent truncating probability mass at the endpoints of the domain, we only record measurements on\nthe interior subinterval [\u21e1/2,\u21e1/ 2].\nWhile we offer more details in the following sections, we \ufb01rst overview brie\ufb02y the goals of the two\nremaining \ufb01gures. In Figure 4, we explore how quality of predictions made by the nonlinear and\nlinear relationships in (eqs. 12 and 13) change as a function of the SNR k for various power-law\nef\ufb01cient coding powers q. In Figure 5, we observe how the slope of the [linear] relationship changes\nas a function of q, to which we can compare our analytical predictions to simulated results.\n\n(14)\n\n5.1 Tests of prior-dependent nonlinear and prior-independent linear relationships\nThe nonlinear and linear bias-discriminability relationships together form a broad generalization of\nthe perceptual law beyond Wei and Stocker\u2019s prior work [1]. As SNR increases and the relationship\nconverges onto a line (Figure 2D), the \ufb02uctuations along that line are captured by both relationships,\nbut the nonlinear relationship captures some additional \ufb02uctuations orthogonal to the predicted line\n(Figure 3). Both nonlinear and linear relationships are exceptional approximations of the true bias\n\n6\n\n\fFigure 3: Nonlinear and linear bias-discriminability relationships for SNR k = 102 and \u201cdiscrimax\u201d\ncode q = 1/2 under an exemplar random prior. Bias and discriminability match closely under the\nlinear relationship (A), but any deviations from that line are well-captured by the weak nonlinear\nrelationship (C). Deviations from the true bias (red) are best observed if we subtract the true bias\nfrom the predictions of the linear and nonlinear models (gray and black curves, respectively; B, D).\n\n(Figure 3A), but do not capture equivalent features of the curvature \u2013 deviations are often at very\ndifferent values of x (Figure 3B). We can equivalently view this parametrically as a function of\ndiscriminability (Figure 3C, D).\nWe quantify the quality of our nonlinear and linear predictions as a function of SNR by measuring\nan error ratio R, de\ufb01ned as the ratio between the mean-squared error of the bias predictions under a\nmodel (nonlinear, linear) and the total mean-squared error, such that\n\nR =  log\u21e3 M SEmodel\n\nM SEnull \u2318 =  log\u21e3R dx (\u02c6bmodel  b(x))2\n\n(16)\n\nR dx b(x)2\n\n\u2318\n\nwhere \u02c6bmodel, for clarity, represents a bias predicted under a given relationship (eqs. 12 or 13). We use\nthe negative logarithm such that R > 0 imply model predictive performance better than null. This\nratio is de\ufb01ned for each prior, which we then average over 200 random priors for all simulations.\nThe null model in all cases is 0 everywhere. We want each simulation\u2019s mean-squared error to be\nnormalized according to how much bias the underlying prior introduced \u2013 if the prior were \ufb02at, our\nGaussian encoding model is unbiased and symmetric for all moments such that bias is 0 everywhere.\nFor MAP estimation, we use our analytical nonlinear and linear relationships as the models (Figure\n4A,B), further using the difference between the two R =  log(M SEnonlin/M SElin) to measure\nthe relative performance of each model to the other (Figure 4C). We only highlight the regions where\nboth models are making sensible predictions (R > 0). For posterior median and mean computation,\nin the absence of analytical results, we use as the model a linear function regressed to the data. While\nby de\ufb01nition the estimated R > 0, the degree to which it\u2019s positive makes it still a useful surrogate\nfor measuring the relative linearity (Figure 4D,E).\nThe bias-discriminability relationship emerges from modest SNR k for any estimator (MAP, posterior\nmedian, posterior mean) and power-law ef\ufb01cient code with power q, converging into the linear\nrelationship as SNR k increases (Figure 4). The analytical results for the MAP estimator model the\ndata well, as the linear and nonlinear error ratio measures cleanly cross 0 and peak (Figure 4A,B). The\ndecrease after this peak is a numerical precision issue and isn\u2019t informative of perceptual processing \u2013\nboth bias and discriminability measurements collapse into zero as k increases. The minimum SNR\nrequired for good predictions is lower for the nonlinear relationship, and this form makes better\npredictions than the linear relationship throughout, evidenced by the error ratio difference R being\npositive (Figure 4C). Moreover, the slope of the relationship as predicted from (eq. 13), as a function\nof q, exactly matches simulations (Figure 5A).\n\n5.2 Posterior median and mean estimators, anti-Bayesian repulsive biases\nAnalytical results for posterior median and posterior mean estimators are nontrivial, and beyond the\nscope of this work. However, they are likely tractable, and simulations offer interesting insight into\npotentially useful functional forms of an equivalent linear bias-discriminability relationship in the\nhigh-SNR limit. The posterior median could be asymptotically unbiased in q or unbiased at q = 2, as\nthe bias tends to 0 rapidly, and the linear relationship erodes (Figure 4D, Figure 5B). The posterior\n\n7\n\n-0.0100.010.02-4-20246\u00d710\u22123-1.5-1-0.500.511.5-4-20246\u00d710\u22123-0.0100.010.02-0.06-0.04-0.0200.02-1.5-1-0.500.511.5-0.06-0.04-0.0200.02Linear relationNonlinear relationTrue biasDiscriminabilityBiasDifference from true biasDifference from true biasBiasABCDSubtract true biasSubtract true biasDiscriminability\fFigure 4: Linearity and nonlinearity indices of analytical predictions (MAP, A-C) or regression\n\ufb01ts (Posterior median and mean, D-E) as k increases. A, B. error ratio R of linear and nonlinear\nrelationships, respectively, as a function of increasing SNR k and increasing ef\ufb01cient coding power q\nas the color brightens from red to yellow. C. Error ratio difference R shows a lower minimal SNR\nfor the nonlinear model to make effective predictions of bias than the linear model. Regions in which\neither model is not making sensible predictions (R < 0) are faded. D, E. Estimated linear error ratio\n(by regression) for posterior median and mean estimators, respectively. F. Optimal SNR for linear\nbias-discriminability as a function of ef\ufb01cient coding power and Bayesian estimator.\n\nmean, on the other hand, is asymptotically unbiased for q = 1 and has repulsive biases away from\nthe prior for q > 1 (Figure 4E), a hallmark of the Bayesian observer introduced by Wei and Stocker\npreviously [8]. Although we have not developed a formal derivation, we propose the following simple\nrelationship parametrizing the slope, after using curve-\ufb01tting to explore various functional forms:\n\nb(x) ?=\n\nlog(q)\npq\n\n1\nd0\n\n2\n\nd\ndx\n\nD(x)2\n\n(17)\n\nq = 1 is a natural transition point for these attractive-repulsive biases (see the zero-crossing in\nFigure 5C). Recalling (13), in this setting, the Fisher information is simply a scaling of the prior. For\nq < 1, low-probability events have boosted probability mass since p(x) < p(x)q. Meanwhile, for\nq > 1, these same events have compressed probability mass since p(x) > p(x)q. For a power-law\nef\ufb01cient code, q is determining the weight of the tails of this likelihood. In this way, the speci\ufb01c\ninfomax setting of q = 2 demonstrates repulsive biases not because it corresponds to a mutual\ninformation-maximizing encoder, but because of the tail behaviors it induces by being greater than 1.\n\n6 Discussion\n\nWe have shown that a perceptual law governing the relationship between perceptual bias and dis-\ncriminability arises under a wide range of Bayesian optimal encoding models. This extends previous\nwork showing that the law arises from information-theoretically optimal codes [1], which our work\nincludes as a special case. Maximization of mutual information therefore does not provide a privileged\nexplanation for the neural codes underlying human perceptual behavior, in the sense that the same\nlawful relationship emerges for all members of the more general family of power-law ef\ufb01cient codes.\nWe have also extended the perceptual law put forth by Wei and Stocker by deriving the exact constant\nof proporationality between bias and derivative of squared discriminability for arbitrary choices of\npower-law exponent.\n\n8\n\n0.511.522.5Efficient coding power (q)q = 1Nonlin model betterLin model betterMAPPosterior medianPosterior meanABCFDELinear error ratioError ratio differenceEstimated linear error ratio SNR k at optimum of linear error ratioEfficient coding power (q)SNR (k)SNR (k)SNR (k)SNR (k)SNR (k)0510152025100102104106108-10-505101021031041051061070246810-10-50510-10-505101.002.01.52.50.5100102104106108100102104106108100102104106108100102104106108MAP estimatorPost. median estimatorPost. mean estimatorNonlinear error ratioEstimated linear error ratio\fFigure 5: Linear slope of the bias-discriminability relation as a function of the ef\ufb01cient coding power\nq. A. MAP estimator for analytical predictions (solid line) and simulations (dots). B. Posterior\nmedian estimator for simulations. C. Posterior mean estimator for simulations \ufb01t parsimoniously by\na simple equation. Note that the slope changes sign after q = 1 (vertical line). Before this crossing,\nbiases are prior-attractive (q < 1), and after are prior-repulsive, or \u201canti-Bayesian\" (q > 1).\n\nMore generally, we have shown that power-law ef\ufb01cient codes arise under a general optimization\nprogram that trades off the cost of making errors against a constraint on the total Fisher information\n(eq. 7). Any particular allocation of Fisher information relative to the prior is therefore optimal under\nmultiple settings of loss function and constraint, and information-theoretically optimal coding is\nconsistent with a range of different power-law relationships between Fisher information and prior.\nThis implies that the form of an optimal power-law ef\ufb01cient code depends on specifying a choice of\nconstraint as well as a choice of loss function.\nAlthough our work shows that Wei and Stocker\u2019s perceptual law is equally consistent with multiple\nforms of optimal encoding, other recent work has suggested that information-maximization provides\na better explanation of both perceptual and neural data than other loss functions [19]. One interesting\ndirection for future work will be to determine whether other members of the power-law ef\ufb01cient\ncoding family can provide equally accurate accounts of such data.\nAnother direction for future work will be to consider more general families of ef\ufb01cient neural codes.\nWe hypothesize that, since power functions form a basis set for any function, we could show that\nWei and Stocker\u2019s law emerges whenever neural resources are allocated according to any strictly\nmonotonic function of the prior (with positive support). Such an ef\ufb01cient coding principle could\nimply\n\nd\ndx\n\nJ(x) / Gp(x) ?=) b(x) /\nD(x)2 for strictly monotone G : {p(x) | x 2X}! R+ (18)\nCritically, various specialized neural circuits throughout the brain needn\u2019t adopt the same power-law\nq, or function G(\u00b7). The end result is the same: biases nudge perceptual estimates towards stimuli\nthat are more (or potentially less) discriminable (confer eq. 1, bias is a scaled step along the gradient\nof discriminability). Neural populations could therefore specialize computations by re\ufb01ning q or\nG(\u00b7) to precisely privilege or discount representations of stimuli with different prior probabilities.\nMutual information is one of many such specializations, and is likely sensible under some conditions,\nbut not necessarily all. In this way, the bias-discriminability relationship could be the signature\nof a unifying organizational principle governing otherwise diverse neural populations that encode\nsensory information. It could be useful to reconceptualize \u201cef\ufb01cient codes\u201d accordingly as a broad\nfamily of codes governed by this more general normative principle, within which an ef\ufb01cient code\nputatively allocates neural resources such that stimuli that are common under the prior are encoded\nmore faithfully than stimuli that are uncommon under the prior. We note that this echoes our initial\nintuitions of a good encoder, and we\u2019ve provided evidence to suggest that this simple condition could\nbe suf\ufb01cient.\nAcknowledgments\nWe thank David Zoltowski and Nicholas Roy for helpful comments. MJM was supported by an NSF\nGraduate Research Fellowship; JWP was supported by grants from the McKnight Foundation, Simons\nCollaboration on the Global Brain (SCGB AWD1004351) and NSF CAREER Award (IIS-1150186).\n\n9\n\nABCSimulation resultsAnalytical predictions,Efficient coding powerSlope at optimum of linearity index00.511.522.500.511.522.500.511.522.5-9-8-7-6-5-4-3-2-101Simulation resultsPossible analytical pred.,MAP estimatorPosterior median estimatorPosterior mean estimator\fReferences\n[1] Xue-Xin Wei and Alan A Stocker. Lawful relation between perceptual bias and discriminability.\n\nProceedings of the National Academy of Sciences, 114(38):10244\u201310249, 2017.\n\n[2] Gustav Fechner. Elements of psychophysics. Vol. I. New York, 1966.\n[3] Stanley S Stevens. On the psychophysical law. Psychological Review, 64(3):153, 1957.\n[4] Harrison H Barrett, Jie Yao, Jannick P Rolland, and Kyle J Myers. Model observers for\nassessment of image quality. Proceedings of the National Academy of Sciences, 90(21):9758\u2013\n9765, 1993.\n\n[5] Alan A Stocker and Eero P Simoncelli. Noise characteristics and prior expectations in human\n\nvisual speed perception. Nature Neuroscience, 9(4):578, 2006.\n\n[6] Deep Ganguli and Eero P Simoncelli. Implicit encoding of prior probabilities in optimal neural\n\npopulations. In Advances in Neural Information Processing Systems, pages 658\u2013666, 2010.\n\n[7] Deep Ganguli and Eero P Simoncelli. Ef\ufb01cient sensory encoding and bayesian inference with\n\nheterogeneous neural populations. Neural Computation, 26(10):2103\u20132134, 2014.\n\n[8] Xue-Xin Wei and Alan A Stocker. A bayesian observer model constrained by ef\ufb01cient coding\n\ncan explain \u2018anti-bayesian\u2019 percepts. Nature Neuroscience, 18(10):1509, 2015.\n\n[9] Nicolas Brunel and Jean-Pierre Nadal. Mutual information, \ufb01sher information, and population\n\ncoding. Neural Computation, 10(7):1731\u20131757, 1998.\n\n[10] Xue-Xin Wei and Alan A. Stocker. Mutual information, \ufb01sher information, and ef\ufb01cient coding.\n\nNeural Computation, 28(2):305\u2013326, 2016/01/23 2015.\n\n[11] Peggy Seri\u00e8s, Alan A Stocker, and Eero P Simoncelli. Is the homunculus \u201caware\" of sensory\n\nadaptation? Neural Computation, 21(12):3271\u20133304, 2009.\n\n[12] Zhuo Wang, Alan A Stocker, and Daniel D Lee. Ef\ufb01cient neural codes that minimize lp\n\nreconstruction error. Neural computation, 28(12):2656\u20132686, 2016.\n\n[13] Harry L Van Trees. Detection, estimation, and modulation theory, part I: detection, estimation,\n\nand linear modulation theory. John Wiley & Sons, 2004.\n\n[14] Steve Yaeli and Ron Meir. Error-based analysis of optimal tuning functions explains phenomena\n\nobserved in sensory neurons. Frontiers in computational neuroscience, 4:130, 2010.\n\n[15] Jean-Fran\u00e7ois Bercher. Source coding with escort distributions and r\u00e9nyi entropy bounds.\n\nPhysics Letters A, 373(36):3235\u20133238, 2009.\n\n[16] L Lore Campbell. A coding theorem and r\u00e9nyi\u2019s entropy. Information and control, 8(4):423\u2013429,\n\n1965.\n\n[17] J-F Bercher. On escort distributions, q-gaussians and \ufb01sher information. In AIP Conference\n\nProceedings, volume 1305, pages 208\u2013215. AIP, 2011.\n\n[18] Jonathan W Pillow. Explaining the especially pink elephant. Nat Neurosci, 18(10):1435\u20131436,\n\n10 2015. URL http://dx.doi.org/10.1038/nn.4122.\n\n[19] Deep Ganguli and Eero P Simoncelli. Neural and perceptual signatures of ef\ufb01cient sensory\n\ncoding. arXiv preprint arXiv:1603.00058, 2016.\n\n[20] Jean-Fran\u00e7ois Bercher. On generalized cram\u00e9r\u2013rao inequalities, generalized \ufb01sher information\nand characterizations of generalized q-gaussian distributions. Journal of Physics A: Mathemati-\ncal and Theoretical, 45(25):255\u2013303, 2012.\n\n10\n\n\f", "award": [], "sourceid": 2449, "authors": [{"given_name": "Michael", "family_name": "Morais", "institution": "Princeton University"}, {"given_name": "Jonathan", "family_name": "Pillow", "institution": "Princeton University"}]}