{"title": "A probabilistic population code based on neural samples", "book": "Advances in Neural Information Processing Systems", "page_first": 7070, "page_last": 7079, "abstract": "Sensory processing is often characterized as implementing probabilistic inference: networks of neurons compute posterior beliefs over unobserved causes given the sensory inputs. How these beliefs are computed and represented by neural responses is much-debated (Fiser et al. 2010, Pouget et al. 2013). A central debate concerns the question of whether neural responses represent samples of latent variables (Hoyer & Hyvarinnen 2003) or parameters of their distributions (Ma et al. 2006) with efforts being made to distinguish between them (Grabska-Barwinska et al. 2013).\nA separate debate addresses the question of whether neural responses are proportionally related to the encoded probabilities (Barlow 1969), or proportional to the logarithm of those probabilities (Jazayeri & Movshon 2006, Ma et al. 2006, Beck et al. 2012). \nHere, we show that these alternatives -- contrary to common assumptions -- are not mutually exclusive and that the very same system can be compatible with all of them.\nAs a central analytical result, we show that modeling neural responses in area V1 as samples from a posterior distribution over latents in a linear Gaussian model of the image implies that those neural responses form a linear Probabilistic Population Code (PPC, Ma et al. 2006). In particular, the posterior distribution over some experimenter-defined variable like \"orientation\" is part of the exponential family with sufficient statistics that are linear in the neural sampling-based firing rates.", "full_text": "A probabilistic population code based on neural\n\nsamples\n\nSabyasachi Shivkumar\u2217, Richard D. Lange\u2217, Ankani Chattoraj\u2217, Ralf M. Haefner\n\nBrain and Cognitive Sciences, University of Rochester\n{sshivkum, rlange, achattor, rhaefne2}@ur.rochester.edu\n\nAbstract\n\nSensory processing is often characterized as implementing probabilistic inference:\nnetworks of neurons compute posterior beliefs over unobserved causes given\nthe sensory inputs. How these beliefs are computed and represented by neural\nresponses is much-debated (Fiser et al. 2010, Pouget et al. 2013). A central debate\nconcerns the question of whether neural responses represent samples of latent\nvariables (Hoyer & Hyvarinnen 2003) or parameters of their distributions (Ma et al.\n2006) with efforts being made to distinguish between them (Grabska-Barwinska et\nal. 2013). A separate debate addresses the question of whether neural responses are\nproportionally related to the encoded probabilities (Barlow 1969), or proportional\nto the logarithm of those probabilities (Jazayeri & Movshon 2006, Ma et al. 2006,\nBeck et al. 2012). Here, we show that these alternatives \u2013 contrary to common\nassumptions \u2013 are not mutually exclusive and that the very same system can be\ncompatible with all of them. As a central analytical result, we show that modeling\nneural responses in area V1 as samples from a posterior distribution over latents\nin a linear Gaussian model of the image implies that those neural responses form\na linear Probabilistic Population Code (PPC, Ma et al. 2006). In particular, the\nposterior distribution over some experimenter-de\ufb01ned variable like \u201corientation\u201d is\npart of the exponential family with suf\ufb01cient statistics that are linear in the neural\nsampling-based \ufb01ring rates.\n\n1\n\nIntroduction\n\nIn order to guide behavior, the brain has to infer behaviorally relevant but unobserved quantities from\nobserved inputs in the senses. Bayesian inference provides a normative framework to do so; however,\nthe computations required to compute posterior beliefs about those variables exactly are typically\nintractable. As a result, the brain needs to perform these computations in an approximate manner.\nThe nature of this approximation is unclear with two principal classes having emerged as candidate\nhypotheses: parametric (variational) and sampling-based [8, 20].\nIn the \ufb01rst class, neural responses are interpreted as the parameters of the probability distributions\nthat the brain computes and represents. The most popular members of this class are Probabilistic\nPopulation Codes (PPCs, [13, 4, 3, 2, 21, 19]). Common PPCs are based on the empirical observation\nthat neural variability is well-described by an exponential family with linear suf\ufb01cient statistics.\nApplying Bayes\u2019 rule to compute the posterior probability, p(s|r), over some task-relevant scalar\nquantity, s, from the neural population response, r, one can write [2]:\n\np(s|r) \u221d g(s) exp(cid:2)h(s)(cid:62)r(cid:3)\n\n(1)\nwhere each entry of h(s) represents a stimulus-dependent kernel characterizing the contribution\nof each neuron\u2019s response to the distribution, and g(s) is some stimulus-dependent function that\n\n\u2217Equal contribution\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fFigure 1: General setup: Our model performs sampling-based inference over x in a probabilistic\nmodel of the image, I. In a given experiment, the image is generated according to the experimenter\u2019s\nmodel that turns a scalar stimulus s, e.g. orientation, into an image observed by the brain. The\nsamples drawn from the model are then probabilistically \u201cdecoded\u201d in order to infer the implied\nprobability distribution over s from the brain\u2019s perspective. While the samples shown here are binary,\nour derivation of the PPC is agnostic to whether they are binary or continuous, or to the nature of the\nbrain\u2019s prior over x.\n\nis independent of r. Importantly, the neural responses, r, are linearly related to the logarithm of\nthe probability rather than the probability itself. This has been argued to be a convenient choice\nfor the brain to implement important probabilistic operations like evidence integration over time\nand cues using linear operations on \ufb01ring rates [2].\nIn addition, PPC-like codes are typically\n\u201cdistributed\u201d since the belief over a single variable is distributed over the activity of many neurons,\nand different low-dimensional projections of those activities may represent beliefs over multiple\nvariables simultaneously [19]. Furthermore, because s is de\ufb01ned by the experimenter and not\nexplicitly inferred by the brain in our model we call it \u201cimplicit.\u201d\nIn the second class of models, instead of representing parameters, neural responses are interpreted\nas samples from the represented distribution. First proposed by Hoyer & Hyvarinnen (2003), this\nline of research has been elaborated in the abstract showing how it might be implemented in neural\ncircuits [7, 18, 5] as well as for concrete generative models designed to explain properties of neurons\nin early visual cortex [14, 15, 24, 12, 16, 10]. Here, each neuron (or a subset of principal neurons),\nrepresents a single latent variable in a probabilistic model of the world. The starting point for these\nmodels is typically a speci\ufb01c generative model of the inputs which is assumed to have been learnt\nby the brain from earlier sensory experience, effectively assuming a separation of time-scales for\nlearning and inference that is empirically justi\ufb01ed at least for early visual areas. Rather than being\nthe starting point as for PPCs, neural variability in sampling-based models emerges as a consequence\nof any uncertainty in the represented posterior. Importantly, samples have the same domain as the\nlatents and do not normally relate to either log probability or probability directly.\nThis paper will proceed as illustrated in Figure 1: First, we will de\ufb01ne a simple linear Gaussian\nimage model as has been used in previous studies. Second, we will show that samples from this\nmodel approximate an exponential family with linear suf\ufb01cient statistics. Third, we will relate the\nimplied PPC, in particular the kernels, h(s), to the projective \ufb01elds in our image model. Fourth,\nwe will discuss the role of nuisance variables in our model. And \ufb01nally, we will show that under\nassumption of binary latent in the image model, neural \ufb01ring rates are both proportional to probability\n(of presence of a given image element) and log probability (of implicitly encoded variables like\norientation).\n\n2 A neural sampling-based model\n\nWe follow previous work in assuming that neurons in primary visual cortex (V1) implement proba-\nbilistic inference in a linear Gaussian model of the input image [14, 15, 12, 6, 10]:\n\n(2)\nwhere N (y; \u00b5, \u03a3) denotes the probability distribution function of a normal random variable (mean\n\u00b5 and covariance \u03a3) evaluated at y, and (cid:49) is the identity matrix. The observed image, I, is\n\nx\n\nP (I|x) = N (I; Ax, \u03c32\n\n(cid:49))\n\n2\n\n\fdrawn from a Normal distribution around a linear combination of the projective \ufb01elds (PFn),\nA = (PF1, . . . , PFN ) of all the neurons (1, . . . , N ) weighted by their activation (responses),\nx = (x1, . . . , xN )(cid:62). The projective \ufb01elds can be thought of as the brain\u2019s learned set of basis\nfunctions over images. The main empirical justi\ufb01cation for this model consists in the fact that under\nthe assumption of a sparse independent prior over x, the model learns projective \ufb01eld parameters that\nstrongly resemble the localized, oriented and bandpass features that characterize V1 neurons when\ntrained on natural images [14, 6]. Hoyer & Hyvarinen (2003) proposed that during inference neural\nresponses can be interpreted as samples in this model. Furthermore, Orban et al. (2016) showed\nthat samples from a closely related generative model (Gaussian Scale Mixture Model, [24]) could\nexplain many response properties of V1 neurons beyond receptive \ufb01elds. Since our main points are\nconceptual in nature, we will develop them for the slightly simpler original model described above.\nGiven an image, I, we assume that neural activities can be thought of as samples from the posterior\ndistribution, x(i) \u223c p(x|I) \u221d p(I|x)pbrain(x) where pbrain(x) is the brain\u2019s prior over x. In this\nmodel each population response, x = (x1, . . . , xN )(cid:62), represents a sample from the brain\u2019s posterior\nbelief about x|I. Each xn, individually, then represents the brain marginal belief about the intensity of\nthe feature PFn in the image. This interpretation is independent of any task demands, or assumptions\nby the experimenter. It is up to the experimenter to infer the nature of the variables encoded by\nsome population of neurons from their responses, e.g. by \ufb01tting this model to data. In the next\nsection we will show how these samples can also be interpreted as a population code over some\nexperimenter-de\ufb01ned quantity like orientation (Figure 1).\n\n3 Neural samples form a Probabilistic Population Code (PPC)\n\nIn many classic neurophysiology experiments [17], the experimenter presents images that only vary\nalong a single experimenter-de\ufb01ned dimension, e.g. orientation. We call this dimension the quantity\nof interest, or s. The question is then posed, what can be inferred about s given the neural activity\nin response to a single image representing s, x \u223c p(x|s). An ideal observer would simply apply\nBayes\u2019 rule to infer p(s|x) \u221d p(x|s)p(s) using its knowledge of the likelihood, p(x|s), and prior\nknowledge, p(s). We will now derive this posterior over s as implied by the samples drawn from our\nmodel in section (2).\nWe assume the image as represented by the brain\u2019s sensory periphery (retinal ganglion cells) can be\nwritten as\n\np(I|s) = N (I; T(s), \u03c32\n\nexp\u2192brain\n\n(cid:49))\n\n(3)\n\nwhere T is the experimenter-de\ufb01ned function that translates the scalar quantity of interest, s, into an\nactual image, I. T could represent a grating of a particular spatial frequency and contrast, or any other\nshape that is being varied along s in the course of the experimenter. We further allow for Gaussian\npixel noise with variance \u03c32\nexp\u2192brain around the template T(s) in order to model both external noise\n(which is sometimes added by experimentalists to vary the informativeness of the image) and noise\ninternal to the brain (e.g. sensor noise).\nLet us now consider a single neural sample x(i) drawn from the brain\u2019s posterior conditioned on an\nimage I. From the linear Gaussian generative model in equation (2), the likelihood of a single sample\nis\n\nThe probability of drawing t independent samples2 of x is,\n\np(I|x(i)) = N (I; Ax(i), \u03c32\n\n(cid:49)).\n\nx\n\nt(cid:89)\nt(cid:89)\n\ni=1\n\np(x(1,2,...,t)|I) =\n\n=\n\np(x(i)|I)\n\np(I|x(i))pbrain(x(i))\n\ni=1\n\npbrain(I)\n\n2Depending on how the samples are being generated, consecutive samples are likely to be correlated to\nsome degree. However, the central result derived in this section which is valid for in\ufb01nitely many samples still\nholds due to the possibility of thinning in this case. Only for the \ufb01nite sample case will autocorrelations lead to\ndeviations from the solutions here\n\n3\n\n\ft(cid:89)\n\ni=1\n\n=\n\n1\n\npbrain(I)t\n\np(I|x(i))pbrain(x(i))\n\nSince the experimenter and brain have different generative models, the prior over the variables are\ndependent on the generative model that they are a part of (speci\ufb01ed by the subscript in their pdf).\nSubstituting in the Gaussian densities and combining all terms that depend on x but not on I into\n\u03ba(x(1,2,...,t)), we get\n\n(cid:80)t\n\np(x(1,2,...,t)|I) = \u03ba\n\npbrain(I)t N\n1 x(i) is the mean activity of the units over time.\n\n1\n\nwhere \u00afx = 1\nt\nWe next derive the posterior over samples given the experimenter-de\ufb01ned stimulus s:\n\nI; A\u00afx,\n\n(cid:49)\n\n\u03c32\nx\nt\n\n.\n\n(4)\n\n(cid:18)\n\n(cid:19)\n\n(cid:16)\n\np(x(1,2,...,t)|s) =\n\nx(1,2,...,t)(cid:17)\n(cid:90)\nx(1,2,...,t)(cid:17)(cid:90)\n(cid:16)\npbrain(I)t N\n(cid:18)\nx(1,2,...,t)(cid:17)(cid:90)\n(cid:18)\n(cid:20)\npbrain(I)t N\nx(1,2,...,t)(cid:17)N\n(cid:34)\npbrain(I)t N\n\nT(s); A\u00afx,\n\nI;\n\n1\n\n1\n\n(cid:16)\n(cid:16)\n\n(cid:18)\n\n= \u03ba\n\n(cid:90)\n\nSubstituting in our result from equation (4), we obtain\n1\n\np(x(1,2,...,t)|s) = \u03ba\n\nMaking use of equation (3) we can write\np(x(1,2,...,t)|s) = \u03ba\n\np(x(1,2,...,t)|I)p(I|s)dI\n\n(cid:19)\n\np(I|s)dI.\n\nI; A\u00afx,\n\n(cid:49)\n\n\u03c32\nx\nt\n\n(cid:19)\n\nI; A\u00afx,\n\n(cid:49)\n\n\u03c32\nx\nt\n\n\u03c32\nexp\u2192brain +\n\n(cid:19)\n\n(cid:21)\n\n(cid:49)\n\n. . .\n\n\u03c32\nx\nt\n\nN (I; T(s), \u03c32\n\nexp\u2192brain\n\n(cid:49))dI\n\nT(s)\u03c32\nt\u03c32\n\nx + A\u00afxt\u03c32\nexp\u2192brain + \u03c32\nx\n\nexp\u2192brain\n\n,\n\nexp\u2192brain\n\nx\u03c32\n\u03c32\nt\u03c32\nexp\u2192brain + \u03c32\nx\n\n(cid:49)\n\n(cid:35)\n\ndI\n\n(5)\n\n(6)\n\n(7)\n\n(8)\n\nAs the number of samples, t, increases, the variance of the Gaussian inside the integral converges to\nzero so that for large t we can approximate the integral by the integrand\u2019s value at the mean of the\nGaussian. The Gaussian\u2019s mean itself converges to A\u00afx so that we obtain:\n\n(cid:16)\n\nx(1,2,...,t)(cid:17)N(cid:2)T(s); A\u00afx, \u03c32\n\n(cid:49)(cid:3)\n\np(x(1,2,...,t)|s) \u2248 \u03ba\n\nexp\u2192brain\n\n1\n\npbrain(A\u00afx)t .\n\nApplying Bayes\u2019 rule and absorbing all terms that do not contain s into the proportionality we \ufb01nd\nthat in the limit of in\ufb01nitely many samples\n\np(s|x(1,2,...,t)) \u221d N (T(s); A\u00afx, \u03c32\n\nexp\u2192brain\n\n(cid:49))pexp(s).\n\nWe can now rewrite this expression in the canonical form for the exponential family\n\np(s|x(1,2,...,t)) \u221d g(s) exp(h(s)(cid:62) \u00afx) where\n\ng(s) = exp\n\npexp(s)\n\nand\n\n(cid:32)\n\u2212 T(s)(cid:62)T(s)\nexp\u2192brain\n\n(cid:33)\n\n2\u03c32\nT(s)(cid:62)A\n\u03c32\nexp\u2192brain\n\n.\n\nh(s) =\n\nIf x(i) is represented by neural responses (either spikes or instantaneous rates), \u00afx becomes the vector\nof mean \ufb01ring rates (r) of the population up to time t. Hence, in the limit of many samples, the neural\nresponses form a linear PPC (equation (1)).\n\nFinite number of samples\n\nThe top row of Figure 2 shows a numerical approximation to the posterior over s for the \ufb01nite sample\ncase and illustrates its convergence for t \u2192 \u221e for the example model described in the previous\nsection. As expected, posteriors for small numbers of samples are both wide and variable, and they get\nsharper and less variable as the number of samples increases (three runs are shown for each condition).\nSince the mean samples (\u00afx) only depends on the marginals over x, we can approximate it using\nthe mean \ufb01eld solution for our image model. The bottom row of Figure 2 shows the corresponding\npopulation responses: spike count for each neurons on the y\u2212axis sorted by the preferred stimulus of\neach neuron on the x\u2212axis.\n\n4\n\n\fFigure 2: a-c) Posterior over s for three runs (colored) and the expected posterior across many\nruns (black) for increasing number of samples. d) All runs converge to the same posterior (black).\nPosterior decoded from a mean-\ufb01eld Variational Bayes (VB) approximation to asymptotic \ufb01ring rates\nin orange. e-h) Same simulations as in a-d but now plotting population spike counts sorted by each\nneuron\u2019s preferred orientation. Note that the counting window scales with the number of samples\nacross panels. Panel h shows VB approximation to asymptotic \ufb01ring rates in orange.\n\nInterpretation of the implied PPC\n\nThe relationships that we have derived for g(s) and h(s) (equations (7-8)) provide insights into\nthe nature of the PPC that arises in a linear Gaussian model of the inputs. A classic stimulus to\nconsider when probing and modeling neurons in area V1 is orientation. If the presented images\nare identical up to orientation, and if the prior distribution over presented orientations is \ufb02at, then\ng(s) will be constant. Equation (7) shows how g(s) changes as either of those conditions does\nnot apply, for instance when considering stimuli like spatial frequency or binocular disparity for\nwhich the prior signi\ufb01cantly deviates from constant. More interestingly, equation (8) tells us how\nthe kernels that characterize how each neuron\u2019s response contribute to the population code over s\ndepends both on the used images, T(s), and the projective \ufb01elds, PFn, contained in A. Intuitively,\nthe more T(s)(cid:62)PFn depends on s, the more informative is that neuron\u2019s response for the posterior\nover s. Interestingly, equation (8) can be seen as a generalization from a classic feedforward model\nconsisting of independent linear-nonlinear-Poisson (LNP) neurons in which the output nonlinearity\nis exponential, to a non-factorized model in which neural responses are generally correlated. In\nthis case, h(s) is determined by the projective \ufb01eld, rather than the receptive \ufb01eld of a neuron (the\nreceptive \ufb01eld, RF, being the linear image kernel in an LNP model of the neuron\u2019s response). It\nhas been proposed that each latent\u2019s sample may be represented by a linear combination of neural\nresponses [23], which can be incorporated into our model with h(s) absorbing the linear mapping.\nImportantly, the kernels, h(s), and hence the nature of the PPC changes both with changes in the\nexperimenter-de\ufb01ned variable, s (e.g. whether it is orientation, spatial frequency, binocular disparity,\netc.), and with the set of images, T(s). The h(s) will be different for gratings of different size\nand spatial frequency, for plaids, and for rotated images of houses, to name a few examples. This\nmeans that a downstream area trying to form a belief about s (e.g. a best estimate), or an area that\nis combining the information contained in the neural responses x with that contained in another\npopulation (e.g. in the context of cue integration) will need to learn the h(s) separately for each task.\n\n5\n\n\fFigure 3: The link between s and x is provided by the likelihood de\ufb01ned in image space. a) A\nmanifold de\ufb01ned by T(s) is shown in the space of two example pixels. The likelihood of any stimulus\ns for a particular sample x is related to the distance of that x projected into image space, Ax, and s,\nprojected into image space, T (s) (up to \u03c3exp\u2192brain noise). b) Same as a, but for a more complicated\nmanifold. (Illustration, but note that rotating even a simple grating looks similar to the manifold\nshown here, not that in a.) The location of Ax in this space determines the relative heights of the\nmultiple peaks of the implied posterior over s, shown in panel c.\n\nMultimodality of the PPC\n\nUseful insights can be gained from the fact that \u2013 at least in the case investigated here \u2014 the implied\nPPC is crucially shaped by the distance measure in the space of sensory inputs, I, de\ufb01ned by our\ngenerative model (see equation 3). Figure 3 illustrates this dependence in pixel space: the posterior\nfor a given value of s is monotonically related to the distance between the image \u201creconstructed\u201d by\nthe mean sample, \u00afx, and the image corresponding to that value of s. If this reconstruction lies close\nenough to the image manifold de\ufb01ned by T(s), then the implied posterior will have a local maximum\nat the value for s which corresponds to the T(s) closest to A\u00afx. Whether p(s|x(1), . . . , x(t)) has other\nlocal extrema depends on the shape of the T(s)\u2212manifold (compare panels a and b). Importantly,\nthe relative height of the global peak compared to other local maxima will depend on two other\nfactors: (a) the amount of noise in the experimenter-brain channel, represented by \u03c3exp\u2192brain, and\n(b) how well the generative model learnt by the brain can reconstruct the T(s) in the \ufb01rst place.\nFor a complete, or overcomplete model, for instance, A\u00afx will exactly reconstruct the input image\nin the limit of many samples. As a result, the brain\u2019s likelihood, and hence the implied posterior\nover s, will have a global maximum at the corresponding s (blue in Figure 3B). However, if the\ngenerative model is undercomplete, then A\u00afx may lie far from the T(s) manifold and in fact be\napproximately equidistant to two or more points on T(s) with the result that the implied posterior\nover s becomes multimodal with the possibility that multiple peaks have similar height. While V1\u2019s\nmodel for monocular images is commonly assumed to be complete or even overcomplete [25], it may\nbe undercomplete for binocular images where large parts of the binocular image space do not contain\nany natural images. (Note that the multimodality in the posterior over s discussed here is independent\nof any multimodality in the posterior over x. In fact, it is easy to see that for an exponential prior and\nGaussian likelihood, the posterior p(x|I) is always Gaussian and hence unimodal while the posterior\nover s may still be multimodal.)\n\nDissociation of neural variability and uncertainty\n\nIt is important to appreciate the difference between the brain\u2019s posteriors over x, and over s. The\nformer represents a belief about the intensity or absence/presence of individual image elements in\nthe input. The latter represents implicit knowledge about the stimulus that caused the input given\nthe neural responses. Neural variability, as modeled here, corresponds to variability in the samples\nx(i) and is directly related to the uncertainty in the posterior over x. The uncertainty over s encoded\nby the PPC, on the other hand, depends on the samples only through their mean, not their variance.\nGiven suf\ufb01ciently many samples, the uncertainty over s is only determined by the noise in the channel\nbetween experimenter and brain (modeled as external pixel noise plus pixel-wise internal sensor\nnoise added to the template, T(s)). This means that an experimenter increasing uncertainty over s by\nincreasing external noise should not necessarily expect a corresponding increase in neural variability.\n\n6\n\n\fIllustration of the effect of two nuisance variables: luminance (a-b) and contrast (c-d)\nFigure 4:\non image (a,c), and on the corresponding posteriors over s (b,d). While the posterior is invariant to\nluminance, it depends contrast.\n\nNuisance variables\n\nSo far we have ignored the possible presence of nuisance variables beyond individual pixel noise.\nSuch nuisance variables can be internal or external to the brain. Relevant nuisance variables when\nconsidering experiments done on V1 neurons include overall luminance, contrast, phases, spatial\nfrequencies, etc (for an illustration of the effect of luminance and contrast see Figure 4). An important\nquestion from the perspective of a downstream area in the brain interpreting V1 responses is whether\nthey need to be inferred separately and incorporated in any computations, or whether they leave the\nPPC invariant and can be ignored.\nFor any external nuisance variables, we can easily modify the experimenter\u2019s model in equation (3) to\ninclude a nuisance variable \u03b7 that modi\ufb01es the template, T(s, \u03b7), and hence, the brain\u2019s observation,\nI. This dependency carries through the derivation of the PPC to the end, such that\n\n(cid:32)\n\u2212 T(s, \u03b7)(cid:62)T(s, \u03b7)\nexp\u2192brain\n\n2\u03c32\n\n(cid:33)\n\ng(s, \u03b7) = exp\n\npexp(s)\n\nand h(s, \u03b7) =\n\nT(s, \u03b7)(cid:62)A\n\u03c32\nexp\u2192brain\n\n.\n\n(9)\n\nAs long as T(s, \u03b7)(cid:62)T(s, \u03b7) are separable in s and \u03b7, the nuisance\u2019s parameter in\ufb02uence on g can\nbe absorbed into the proportionality constant. This is clearly the case for the contrast as nuisance\nvariable as discussed in Ma et al. (2006), but in general it will be under the experimenter\u2019s control of\nT whether the separability condition is met. For the PPC over s to be invariant over \u03b7, additionally,\nh(s) needs to be independent of \u03b7. For a linear Gaussian model, this is the case when the projective\n\ufb01elds making up A = (PF1, . . . , PFn) are either invariant to s or to \u03b7. For instance, when A is\nlearnt on natural images, this is usually the case for overall luminance (Figure 4a) since one projective\n\ufb01eld will represent the DC component of any input image, while the other projective \ufb01elds average to\nzero. So while T(s, \u03b7)(cid:62)PF for the projective \ufb01eld representing the DC component will depend on\nthe image\u2019s DC component (overall luminance), it does not depend on other aspects of the image (i.e.\ns). For projective \ufb01elds that integrate to zero, however, T(s, \u03b7)(cid:62)PF is independent of \u03b7, but may be\nmodulated by s (e.g. orientation if the projective \ufb01elds are orientation-selective).\nThe original PPC described by Ma et al. (2006) was shown to be contrast-invariant since both\nthe \u201ctuning curve\u201d of each neuron, relating to T(s, \u03b7)(cid:62)PF in our case, and the response variance\n(taking the place of \u03c32\nexp\u2192brain) were assumed to scale linearly with contrast (in line with empirical\nmeasurements). For our model, we assumed that \u03c3exp\u2192brain was independent of the input, and hence,\nthe T are not invariant to contrast. However, since the noise characteristics of the brain\u2019s sensory\nperiphery (included as sensor noise in our \u03c3exp\u2192brain term) generally depend on the inputs, it remains\na question for future research whether more realistic assumptions about the sensory noise imply an\napproximately invariant PPC over s. 3\nGenerally speaking, the nature of the PPC will depend on the particular image model that the brain\nhas learnt. For instance, numerical results by Orban et al. (2016) suggest that explicitly including\n\n3In contrast to the interpretation of Ma et al. (2006), where contrast invariance is the result of a combination of\nmean response scaling and response variance scaling, in our case it would be a combination of the \u201cfeedforward\u201d\npart of the mean response scaling and the scaling of the variability of the inputs.\n\n7\n\n\fa contrast variable in the image model (Gaussian Scale Mixture, [24]) implies an approximately\ncontrast-invariant PPC over orientation, but how precise and general that \ufb01nding is, remains to be\nseen analytically.\n\n4 Neurons simultaneously represent both probability & log probabilities\n\nTaking the log of equation 6 makes it explicit that the neural responses, x, are linearly related to\nthe log posterior over s. This interpretation agrees with a long list of prior work suggesting that\nneural responses are linearly related to the logarithm of the probabilities that they represent. This\ncontrasts with a number of proposals, starting with Barlow (1969) [1], in which neural responses\nare proportional to the probabilities themselves (both schemes are reviewed in [20]). Both schemes\nhave different advantages and disadvantages in terms of computation (e.g. making multiplication and\naddition particularly convenient, respectively) and are commonly discussed as mutually exclusive.\nWhile in our model, with respect to the posterior over x, neural responses generally correspond to\nsamples, i.e. neither probabilities nor log probabilities, they do become proportional to probabilities\nfor the special case of binary latents. In that case, on the time scale of a single sample, the response\nis either 0 or 1, making the \ufb01ring rate of neuron i proportional to its marginal probability, p(xn|I).\nSuch a binary image model has been shown to be as successful as the original continuous model of\nOlshausen & Field (1996) in explaining the properties of V1 receptive \ufb01elds [11, 6], and is supported\nby studies on the biological implementability of binary sampling [7, 18].\nIn sum, for the special case of binary latents, responses implied by our neural sampling model are at\nonce proportional to probabilities (over xn), and to log probabilities (over s).\n\n5 Discussion\n\nWe have shown that sampling-based inference in a simple generative model of V1 can be interpreted\nin multiple ways, some previously discussed as mutually exclusive. In particular, the neural responses\ncan be interpreted both as samples from the probabilistic model that the brain has learnt for its inputs\nand as parameters of the posterior distribution over any experimenter-de\ufb01ned variables that are only\nimplicitly encoded, like orientation. Furthermore, we describe how both a log probability code as\nwell as a direct probability code can be used to describe the very same system.\nThe idea of multiple codes present in a single system has been mentioned in earlier work [23, 5]\nbut we make this link explicit by starting with one type of code (sampling) and showing how it\ncan be interpreted as a different type of code (parametric) depending on the variable assumed to be\nrepresented by the neurons. Our \ufb01ndings indicate the importance of committing to a model and set of\nvariables for which the probabilities are computed when comparing alternate coding schemes (e.g. as\ndone in [9]).\nOur work connects to machine learning in several ways: (1) our distinction between explicit variables\n(which are sampled) and implicit variables (which can be decoded parametrically) is analogous to the\npractice of re-using pre-trained models in new tasks, where the \u201cencoding\u201d is given but the \u201cdecoding\u201d\nis re-learned per task. Furthermore, (2) the nature of approximate inference might be different for\nencoded latents and for other task-relevant decoded variables, given that our model can be interpreted\neither as performing parametric or sampling-based inference. Finally, (3) this suggests a relaxation\nof the commonplace distinction between Monte-Carlo and Variational methods for approximate\ninference [22]. For instance, our model could potentially be interpreted as a mixture of parametric\ndistributions, where the parameters themselves are sampled.\nWe emphasize that we are not proposing that the model analyzed here is the best, or even a particular\ngood model for neural responses in area V1. Our primary goal was to show that the same model can\nsupport multiple interpretations that had previously been thought to be mutually exclusive, and to\nderive analytical relationships between those interpretations.\nThe connection between the two codes speci\ufb01es the dependence of the PPC kernels on how the image\nmanifold de\ufb01ned by the implicit variable interacts with the properties of the explicitly represented\nvariables. It makes explicit how in\ufb01nitely many posteriors over implicit variables can be \u201cdecoded\u201d\nby taking linear projections of the neural responses, raising questions about the parsimony of a\ndescription of the neural code based on implicitly represented variables like orientation.\n\n8\n\n\fWe also note that the PPC that arises from the image model analyzed here is not contrast invariant\nlike the one proposed by Ma et al. (2006), which was based on the empirically observed response\nvariability of V1 neurons, and the linear contrast scaling of their tuning with respect to orientation.\nOf course, a linear Gaussian model is insuf\ufb01cient to explain V1 responses, and it would be interesting\nto derive the PPC implied by more sophisticated models like a Gaussian Scale Mixture Model [24]\nthat is both a better model for natural images, enjoys more empirical support and, based on numerical\nsimulations, may approximate a contrast-invariant linear PPC over orientation [16].\nFinally, a more general relationship between the structure of the generative model for the inputs, and\nthe invariance properties of PPCs empirically observed for different cortical areas, may help extend\nprobabilistic generative models to higher cortical areas beyond V1.\n\nAcknowledgments\n\nThis work was supported by NEI/NIH awards R01 EY028811 (RMH) and T32 EY007125 (RDL), as\nwell as an NSF/NRT graduate training grant NSF-1449828 (RDL, SS).\n\nReferences\n[1] HB Barlow. Pattern recognition and the responses of sensory neurons. Annals of the New York Academy of\n\nSciences, 156(2):872\u2013881, 1969.\n\n[2] Jeff Beck, Alexandre Pouget, and Katherine A Heller. Complex inference in neural circuits with proba-\nbilistic population codes and topic models. In Advances in Neural Information Processing Systems, pages\n3059\u20133067, 2012.\n\n[3] Jeffrey M Beck, Peter E Latham, and Alexandre Pouget. Marginalization in neural circuits with divisive\n\nnormalization. Journal of Neuroscience, 31(43):15310\u201315319, 2011.\n\n[4] Jeffrey M Beck, Wei Ji Ma, Roozbeh Kiani, Tim Hanks, Anne K Churchland, Jamie Roitman, Michael N\nShadlen, Peter E Latham, and Alexandre Pouget. Probabilistic population codes for bayesian decision\nmaking. Neuron, 60(6):1142\u20131152, 2008.\n\n[5] Johannes Bill, Lars Buesing, Stefan Habenschuss, Bernhard Nessler, Wolfgang Maass, and Robert Legen-\nstein. Distributed bayesian computation and self-organized learning in sheets of spiking neurons with local\nlateral inhibition. PloS One, 10(8):e0134356, 2015.\n\n[6] J\u00f6rg Bornschein, Marc Henniges, and J\u00f6rg L\u00fccke. Are v1 simple cells optimized for visual occlusions? a\n\ncomparative study. PLoS Computational Biology, 9(6):e1003062, 2013.\n\n[7] Lars Buesing, Johannes Bill, Bernhard Nessler, and Wolfgang Maass. Neural dynamics as sampling: a\nmodel for stochastic computation in recurrent networks of spiking neurons. PLoS Computational Biology,\n7(11):e1002211, 2011.\n\n[8] J\u00f3zsef Fiser, Pietro Berkes, Gerg\u02ddo Orb\u00e1n, and M\u00e1t\u00e9 Lengyel. Statistically optimal perception and learning:\n\nfrom behavior to neural representations. Trends in Cognitive Sciences, 14(3):119\u2013130, 2010.\n\n[9] Agnieszka Grabska-Barwinska, Jeff Beck, Alexandre Pouget, and Peter Latham. Demixing odors-fast\ninference in olfaction. In Advances in Neural Information Processing Systems, pages 1968\u20131976, 2013.\n\n[10] Ralf M Haefner, Pietro Berkes, and J\u00f3zsef Fiser. Perceptual decision-making as probabilistic inference by\n\nneural sampling. Neuron, 90(3):649\u2013660, 2016.\n\n[11] Marc Henniges, Gervasio Puertas, J\u00f6rg Bornschein, Julian Eggert, and J\u00f6rg L\u00fccke. Binary sparse coding.\nIn International Conference on Latent Variable Analysis and Signal Separation, pages 450\u2013457. Springer,\n2010.\n\n[12] Patrik O Hoyer and Aapo Hyv\u00e4rinen. Interpreting neural response variability as monte carlo sampling of\n\nthe posterior. In Advances in Neural Information Processing Systems, pages 293\u2013300, 2003.\n\n[13] Wei Ji Ma, Jeffrey M Beck, Peter E Latham, and Alexandre Pouget. Bayesian inference with probabilistic\n\npopulation codes. Nature neuroscience, 9(11):1432, 2006.\n\n[14] Bruno A Olshausen and David J Field. Emergence of simple-cell receptive \ufb01eld properties by learning a\n\nsparse code for natural images. Nature, 381(6583):607, 1996.\n\n9\n\n\f[15] Bruno A Olshausen and David J Field. Sparse coding with an overcomplete basis set: A strategy employed\n\nby v1? Vision research, 37(23):3311\u20133325, 1997.\n\n[16] Gerg\u02ddo Orb\u00e1n, Pietro Berkes, J\u00f3zsef Fiser, and M\u00e1t\u00e9 Lengyel. Neural variability and sampling-based\n\nprobabilistic representations in the visual cortex. Neuron, 92(2):530\u2013543, 2016.\n\n[17] Andrew J Parker and William T Newsome. Sense and the single neuron: probing the physiology of\n\nperception. Annual review of neuroscience, 21(1):227\u2013277, 1998.\n\n[18] Dejan Pecevski, Lars Buesing, and Wolfgang Maass. Probabilistic inference in general graphical models\nthrough sampling in stochastic networks of spiking neurons. PLoS Computational Biology, 7(12):e1002294,\n2011.\n\n[19] Xaq Pitkow and Dora E Angelaki. Inference in the brain: statistics \ufb02owing in redundant population codes.\n\nNeuron, 94(5):943\u2013953, 2017.\n\n[20] Alexandre Pouget, Jeffrey M Beck, Wei Ji Ma, and Peter E Latham. Probabilistic brains: knowns and\n\nunknowns. Nature neuroscience, 16(9):1170, 2013.\n\n[21] Rajkumar Vasudeva Raju and Xaq Pitkow. Inference by reparameterization in neural population codes. In\n\nAdvances in Neural Information Processing Systems, pages 2029\u20132037, 2016.\n\n[22] Tim Salimans, Diederik Kingma, and Max Welling. Markov chain monte carlo and variational inference:\n\nBridging the gap. In International Conference on Machine Learning, pages 1218\u20131226, 2015.\n\n[23] Cristina Savin and Sophie Deneve. Spatio-temporal representations of uncertainty in spiking neural\n\nnetworks. In Advances in Neural Information Processing Systems, pages 2024\u20132032, 2014.\n\n[24] Odelia Schwartz and Eero P Simoncelli. Natural signal statistics and sensory gain control. Nature\n\nneuroscience, 4(8):819, 2001.\n\n[25] Eero P Simoncelli and Bruno A Olshausen. Natural image statistics and neural representation. Annual\n\nreview of neuroscience, 24(1):1193\u20131216, 2001.\n\n10\n\n\f", "award": [], "sourceid": 3518, "authors": [{"given_name": "Sabyasachi", "family_name": "Shivkumar", "institution": "University of Rochester"}, {"given_name": "Richard", "family_name": "Lange", "institution": "University of Rochester"}, {"given_name": "Ankani", "family_name": "Chattoraj", "institution": "University of Rochester"}, {"given_name": "Ralf", "family_name": "Haefner", "institution": "Brain & Cognitive Sciences, University of Rochester"}]}