{"title": "Probabilistic Computation in Spiking Populations", "book": "Advances in Neural Information Processing Systems", "page_first": 1609, "page_last": 1616, "abstract": null, "full_text": "Probabilistic computation in spiking populations\n\n\n\n      Richard S. Zemel      Quentin J. M. Huys          Rama Natarajan        Peter Dayan\n     Dept. of Comp. Sci.        Gatsby CNU             Dept. of Comp. Sci.    Gatsby CNU\n      Univ. of Toronto              UCL                 Univ. of Toronto          UCL\n\n\n\n                                           Abstract\n\n           As animals interact with their environments, they must constantly update\n           estimates about their states. Bayesian models combine prior probabil-\n           ities, a dynamical model and sensory evidence to update estimates op-\n           timally. These models are consistent with the results of many diverse\n           psychophysical studies. However, little is known about the neural rep-\n           resentation and manipulation of such Bayesian information, particularly\n           in populations of spiking neurons. We consider this issue, suggesting a\n           model based on standard neural architecture and activations. We illus-\n           trate the approach on a simple random walk example, and apply it to\n           a sensorimotor integration task that provides a particularly compelling\n           example of dynamic probabilistic computation.\n\n\nBayesian models have been used to explain a gamut of experimental results in tasks which\nrequire estimates to be derived from multiple sensory cues. These include a wide range\nof psychophysical studies of perception;13 motor action;7 and decision-making.3, 5 Central\nto Bayesian inference is that computations are sensitive to uncertainties about afferent and\nefferent quantities, arising from ignorance, noise, or inherent ambiguity (e.g., the aperture\nproblem), and that these uncertainties change over time as information accumulates and\ndissipates. Understanding how neurons represent and manipulate uncertain quantities is\ntherefore key to understanding the neural instantiation of these Bayesian inferences.\n\nMost previous work on representing probabilistic inference in neural populations has fo-\ncused on the representation of static information.1, 12, 15 These encompass various strategies\nfor encoding and decoding uncertain quantities, but do not readily generalize to real-world\ndynamic information processing tasks, particularly the most interesting cases with stim-\nuli changing over the same timescale as spiking itself.11 Notable exceptions are the re-\ncent, seminal, but, as we argue, representationally restricted, models proposed by Gold and\nShadlen,5 Rao,10 and Deneve.4\n\nIn this paper, we first show how probabilistic information varying over time can be repre-\nsented in a spiking population code. Second, we present a method for producing spiking\ncodes that facilitate further processing of the probabilistic information. Finally, we show\nthe utility of this method by applying it to a temporal sensorimotor integration task.\n\n\n1      TRAJECTORY ENCODING AND DECODING\n\nWe assume that population spikes R(t) arise stochastically in relation to the trajectory X(t)\nof an underlying (but hidden) variable. We use RT and XT for the whole trajectory and\n\n\f\nspike trains respectively from time 0 to T . The spikes RT constitute the observations and\nare assumed to be probabilistically related to the signal by a tuning function f (X, i):\n\n                               P (R(i, T )|X(T ))  f (X, i)                                                        (1)\n\nfor the spike train of the ith neuron, with parameters i. Therefore, via standard Bayesian\ninference, RT determines a distribution over the hidden variable at time T , P (X(T )|RT ).\n\nWe first consider a version of the dynamics and input coding that permits an analytical\nexamination of the impact of spikes. Let X(t) follow a stationary Gaussian process such\nthat the joint distribution P (X(t1), X(t2), . . . , X(tm)) is Gaussian for any finite collection\nof times, with a covariance matrix which depends on time differences: Ctt = c(|t - t |).\nFunction c(|t|) controls the smoothness of the resulting random walks. Then,\n\n         P (X(T )|RT )  p(X(T ))                      dX(T )P (R\n                                              X(T )                    T |X(T ))P (X(T )|X (T ))                     (2)\n\nwhere P (X(T )|X(T )) is the distribution over the whole trajectory X(T ) conditional on\nthe value of X(T ) at its end point. If RT are a set of conditionally independent inhomoge-\nneous Poisson processes, we have\n\n           P (RT |X(T ))                 f (X(t                                        d f (X( ), \n                                    i              i ), i) exp -      i                               i)    ,    (3)\n\nwhere ti  are the spike times  of neuron i in RT . Let  = [X(ti )] be the vector of\nstimulus positions at the times at which we observed a spike and  = [(ti )] be the vector\nof spike positions. If the tuning functions are Gaussian f (X, i)  exp(-(X - i)2/22)\nand sufficiently dense that                d f (X, \n                               i                       i) is independent of X (a standard assumption\nin population coding), then P (RT |X(T ))  exp(-  -  2/22) and in Equation 2, we\ncan marginalize out X(T ) except at the spike times ti :\n\n   P (X(T )|RT )  p(X(T ))               d exp -[, X(T )]T C-1 [, X(T )] - - 2                                 (4)\n                                                                             2                            22\n\nand C is the block covariance matrix between X(ti ), x(T ) at the spike times [tt ] and the\nfinal time T . This Gaussian integral has P (X(T )|RT )  N ((T ), (T )), with\n\n               (T ) = CT t(Ctt + I2)-1 = k                         (T ) = CT T - kCtT                           (5)\n\nCT T is the T, T th element of the covariance matrix and CT t is similarly a row vector. The\ndependence in  on past spike times is specified chiefly by the inverse covariance matrix,\nand acts as an effective kernel (k). This kernel is not stationary, since it depends on factors\nsuch as the local density of spiking in the spike train RT .\n\nFor example, consider where X(t) evolves according to a diffusion process with drift:\n\n                                     dX = -Xdt +  dN (t)                                                           (6)\n\nwhere  prevents it from wandering too far, N (t) is white Gaussian noise with mean zero\nand 2 variance. Figure 1A shows sample kernels for this process.\n\nInspection of Figure 1A reveals some important traits. First, the monotonically decreasing\nkernel magnitude as the time span between the spike and the current time T grows matches\nthe intuition that recent spikes play a more significant role in determining the posterior over\nX(T ). Second, the kernel is nearly exponential, with a time constant that depends on the\ntime constant of the covariance function and the density of the spikes; two settings of these\nparameters produced the two groupings of kernels in the figure. Finally, the fully adaptive\nkernel k can be locally well approximated by a metronomic kernel k<R> (shown in red in\nFigure 1A) that assumes regular spiking. This takes advantage of the general fact, indicated\nby the grouping of kernels, that the kernel depends weakly on the actual spike pattern, but\nstrongly on the average rate. The merits of the metronomic kernel are that it is stationary\nand only depends on a single mean rate rather than the full spike train RT . It also justifies\n\n\f\n                                                  Kernels  k and ks                                      Variance ratio                                                               Full kernel\n                                             A                                                      B                                                            D\n                                                                                                                                         -0.5\n\n                                       -2                                                     10\n                             10\n                                                                               2  /                                                                        0\n                                       -4                                      2              5                                           Space\n                             10                                                          \n\n     Kernel size (weight)\n                                                                                              0                                                    0.5\n                                        0          0.03    0.06     0.09                            0.04         0.06     0.08    0.1\n                                                      t-tspike                                                      Time\n                                             C                    True stimulus and means                                                                                     Regular, stationary kernel\n                                                                                                                                                                 E\n                                      0.5                                                                                                -0.5\n\n\n\n\n\n                                       0                                                                                                                    0\n                             Space                                                                                                                 Space\n\n\n\n                             -0.5                                                                                                                  0.5\n                                       0.03        0.04     0.05       0.06                   0.07          0.08         0.09     0.1                       0.03      0.04    0.05    0.06    0.07    0.08    0.09    0.1\n                                                                            Time                                                                                                         Time\n\n\n\nFigure 1: Exact and approximate spike decoding with the Gaussian process prior. Spikes\nare shown in yellow, the true stimulus in green, and P (X(T )|RT ) in gray. Blue: exact\ninference with nonstationary and red: approximate inference with regular spiking. A Ker-\nnel samples for a diffusion process as defined by equations 5, 6. B, C: Mean and variance\nof the inference. D: Exact inference with full kernel k and E: approximation based on\nmetronomic kernel k<R>. (Equation 7).\n\n\nthe form of decoder used for the network model in the next section.6 Figure 1D shows an\nexample of how well Equation 5 specifies a distribution over X(t) through very few spikes.\n\nFinally, 1E shows a factorized approximation with the stationary kernel similar to that used\nby Hinton and Brown6 and in our recurrent network:\n\n                                                      ^                                                                            t\n                                                     P (X(t)|R(t))                                              f (X,                            kst\n                                                                                                                                   j=0                      j ij = exp(-E(X(t), R(t), t)),                              (7)\n                                                                                                            i               i)\n\nBy design, the mean is captured very well, but not the variance, which in this example\ngrows too rapidly for long interspike intervals (Figure 1B, C). Using a slower kernel im-\nproves performance on the variance, but at the expense of the mean. We thus turn to the net-\nwork model with recurrent connections that are available to reinstate the spike-conditional\ncharacteristics of the full kernel.\n\n\n2                                       NETWORK MODEL FORMULATION\n\nAbove we considered how population spikes RT specify a distribution over X(T ). We now\nextend this to consider how interconnected populations of neurons can specify distributions\nover time-varying variables. We frame the problem and our approach in terms of a two-level\nnetwork, connecting one population of neurons to another; this construction is intended to\napply to any level of processing. The network maps input population spikes R(t) to output\npopulation spikes S(t), where input and output evolve over time. As with the input spikes,\nST indicates the output spike trains from time 0 to T , and these output spikes are assumed\nto determine a distribution over a related hidden variable.\n\nFor the recurrent and feedforward computation in the network, we start with the de-\nceptively simple goal9 of producing output spikes in such a way that the distribution\nQ(X(T )|ST ) they imply over the same hidden variable X(T ) as the input, faithfully\nmatches P (X(T )|RT ). This might seem a strange goal, since one could surely just lis-\nten to the input spikes. However, in order for the output spikes to track the hidden variable,\nthe dynamics of the interactions between the neurons must explicitly capture the dynamics\n\n\f\nof the process X(T ). Once this `identity mapping' problem has been solved, more general,\ncomplex computations can be performed with ease. We illustrate this on a multisensory\nintegration task, tracking a hidden variable that depends on multiple sensory cues.\n\nThe aim of the recurrent network is to take the spikes R(t) as inputs, and produce output\nspikes that capture the probabilistic dynamics. We proceed in two steps. We first consider\nthe probabilistic decoding process which turns ST into Q(X(t)|ST ). Then we discuss the\nrecurrent and feedforward processing that produce appropriate ST given this decoder. Note\nthat this decoding process is not required for the network processing; it instead provides a\ncomputational objective for the spiking dynamics in the system.\n\nWe use a simple log-linear decoder based on a spatiotemporal kernel:6\n\n                               Q(X(T )|ST )  exp(-E(X(T ), ST , T )) , where                                                   (8)\n                               E(X, S                             T\n                                              T , T ) =                   S(j, T -  )\n                                                             j     =0                            j (X,  )                     (9)\n\nis an energy function, and the spatiotemporal kernels are assumed separable: j(X,  ) =\ngj(X)( ). The spatial kernel gj(X) is related to the receptive field f (X, j) of neuron j\nand the temporal kernel j(X,  ) to k<RT >.\n\nThe dynamics of processing in the network follows a standard recurrent neural architecture\nfor modeling cortical responses, in the case that network inputs R(t) and outputs S(t) are\nspikes. The effect of a spike on other neurons in the network is assumed to have some\nsimple temporal dynamics, described here again by the temporal kernel ( ):\n\n                    ri(t) =         T       R(i, T -  )( ) s                                   S(j, T -  )( )\n                                     =0                               j (t) =            T\n                                                                                           =0\n\nwhere T is the extent of the kernel. The response of an output neuron is governed by a\nstochastic spiking rule, where the probability that neuron j spikes at time t is given by:\n\n                    P (S(j, t) = 1) = (uj(t)) =  (                           w                        v\n                                                                          i         ij ri(t) +     k         kj sk (t - 1))    (10)\nwhere () is the logistic function, and W and V are the feedforward and recurrent weights.\nIf ( ) = exp(- ), then uj(T ) = (0)(Wj  R(T ) + Vj  S(T )) + (1)uj(T - 1); this\ncorresponds to a discretization of the standard dynamics for the membrane potential of a\nleaky integrate-and-fire neuron:  duj = -u\n                                                       dt          j +WR+VS, where the leak  is determined\nby the temporal kernel.\n\nThe task of the network is to make Q(X(T )|ST ) of Equation 8 match P (X(T )|RT ) com-\ning from one of the two models above (exact dynamic or approximate stationary kernel).\nWe measure the discrepancy using the Kullback-Leibler (KL) divergence:\n\n                                    J =               KL [P (X(T )|R\n                                                 t                             T )||Q(X (T )|ST )]                             (11)\nand, as a proof of principle in the experiments below, find optimal W and V by\nminimizing the KL divergence J using back-propagation through time (BPTT). In or-\nder to implement this in the most straightforward way, we convert the stochastic spik-\ning rule (Equation 10) to a deterministic rule via the mean-field assumption: Sj(t) =\n (         w                       v\n       i         ij ri(t) +    k         kj sk (t - 1)).     The gradients are tedious, but can be neatly ex-\npressed in a temporally recursive form. Note that our current focus in the system is on\nthe representational capability of the system, rather than its learning. Our results establish\nthat the system can faithfully represent the posterior distribution. We return to the issue of\nmore plausible learning rules below.\n\nThe resulting network can be seen as a dynamic spiking analogue of the recurrent network\nscheme of Pouget et al.:9 both methods formulate feedforward and recurrent connections so\nthat a simple decoding of the output can match optimal but complex decoding applied to the\ninputs. A further advantage of the scheme proposed here is that it facilitates downstream\nprocessing of the probabilistic information, as the objective encourages the formation of\ndistributions at the output that factorize across the units.\n\n\f\n3    RELATED MODELS\n\nIdeas about the representation of probabilistic information in spiking neurons are in vogue.\nOne treatment considers Poisson spiking in populations with regular tuning functions, as-\nsuming that stimuli change slowly compared with the inter-spike intervals.8 This leads\nto a Kalman filter account with much formal similarity to the models of P (X(T )|RT ).\nHowever, because of the slow timescale, recurrent dynamics can be allowed to settle to an\nunderlying attractor. In another approach, the spiking activity of either a single neuron4 or\na pair of neurons5 is considered as reporting (logarithmic) probabilistic information about\nan underlying binary hypothesis. A third treatment proposes that a population of neurons\ndirectly represents the (logarithmic) probability over the state of a hidden Markov model.10\n\nOur method is closely related to the latter two models. Like Deneve's4 we consider the\ntransformation of input spikes to output spikes with a fixed assumed decoding scheme so\nthat the dynamics of an underlying process is captured. Our decoding mechanism produces\nsomething like the predictive coding apparent in Deneve's scheme, except that here, a neu-\nron may not need to spike not only if it itself has recently spiked and thereby conveyed\nthe appropriate information; but also if one of its population neighbors has recently spiked.\nThis is explicitly captured by the recurrent interactions among the population. Our scheme\nalso resembles Rao's10 approach in that it involves population codes. Our representational\nscheme is more general, however, in that the spatiotemporal decoder defines the relation-\nship between output spikes and Q(X(T )|ST ), whereas his method assumes a direct en-\ncoding, with each output neuron's activity proportional to log Q(X(T )|ST ). Our decoder\ncan produce such a direct encoding if the spatial and temporal kernels are delta functions,\nbut other kernels permit coordination amongst the population to take into account temporal\neffects, and to produce higher fidelity in the output distribution.\n\n\n4    EXPERIMENTS\n\n1. Random walk. We describe two experiments. For ease of presentation and compar-\nison, these simulations treat X(t) as a discrete variable, so that the encoding model is a\nhidden Markov model (HMM) rather than the Gaussian process defined above. The first\nis a random walk, as in Equation 6 and Figure 1, which allows us to make comparisons\nwith the exact statistics. In a discrete setting, the walk parameters  and  determine the\nentries in the transition matrix of the corresponding HMM; in a continuous one, the covari-\nance function C of the Gaussian process. Input spikes are generated according to Gaussian\ntuning functions (Equation 1). In the recurrent network model, the spatiotemporal ker-\nnels are fixed: the spatial kernels are based on the regular locations of the output units j,\ngj(X) = |X - Xj|2/(1 + |X - Xj|2) and the temporal kernel is ( ) = exp(- ),\nwhere  = 2 is set to match the walk dynamics. In the following simulations, the network\ncontained 20 inputs, 60 states, and 20 outputs.\n\nResults on two walk trajectories with different dynamics are shown in Figure 2. The net-\nwork is trained on walks with parameters ( = 0.2,  = 2) that force the state to move to\nand remain near the center. Figures 2A & B show that in intervals without input spikes, the\ninferred mean quickly shifts towards the center and remains there until evidence is received\nin the form of input spikes. The feedforward weights (Fig. 2F) show strong connections\nbetween an input unit and its corresponding output, while the learned recurrent weights\n(Fig. 2E) reflect the transition probabilities: units coding for extreme values have strong\nconnections to those nearer the center, and units with preferred values near the center have\nstrong self-connections. Fig. 2C&D) shows the results of testing this trained network on\nwalks with different dynamics ( = 0.8,  = 7). The resulting mismatch between the\nmean approximated trajectory (yellow line) and true stimulus (dashed line) (Fig. 2D), and\nthe variance (Fig. 2H), shows that the learned weights capture the input dynamics.\n\n\f\n                                              A                                                                                                  B                                                                                          C                                                                                                        D\n\n\n\n\n\n                                              E                                                                                                  F                                                                                          G                                                                                                    H\n                                   Feed-forward Weights                                                                                            Recurrent Weights\n                                                                                                                                              Recurrent Weights\n                                                                                                                                                                                                  4\n                                                                                  4\n\n                2                                                                                                           2                                                                     1.5\n\n\n                                                                                                                                                                                                                                 Variance of Encoded and Decoded Distributions\n                4                                                                                                           10                                                                    2                                                                                                                             Variance of Encoded and Decoded Distributions\n                                                                                  3                                         4                                                                                         16\n                                                                                                                                                                                                  1                                                                                                                 140\n                                                                                                                                                                                                                                                                                Full Inference                                                                                 Full Inference\n                                                                                                                                                                                                                                                                                Approximation                                                                                  Approximation\n                6                                                                                                           6                                                                                         14\n                                                                                                                                                                                                  0                                                                                                                 120\n                                                                                  2                                         20                                                                    0.5\n                8                                                                                                           8                                                                                         12\n\n                                                                                                                                                                                                                                                                                                                    100\n           10                                                                                                                                                                                     -2\n                                                                                  1                                         10                                                                    0\n                                                                                                                            30                                                                                        10\n\n                                                                                                                                                                                                                                                                                                                     80\n           12                                                                                                               12                                                                                         8\n                                                                                                          Output Neurons                                                                          -0.5\n                                                                                                                                                                                                  -4\n                                                                                  0\n     Input Neurons 14                                                                                                       40                                                                                                                                                                                       60\n                                                                                                                            14                                                                            Variance                                                                                      Variance\n                                                                                        Output Neurons                                                                                                                 6\n                                                                                                                                                                                                  -1\n           16                                                                                                                                                                                     -6\n                                                                                  -1                                        16                                                                                                                                                                                       40\n                                                                                                                                                                                                                       4\n                                                                                                                            50\n           18                                                                                                                                                                                     -1.5\n                                                                                                                            18                                                                                                                                                                                       20\n                                                                                                                                                                                                                       2\n                                                                                                                                                                                                  -8\n                                                                                  -2\n           20\n                                                                                                                            20                                                                    -2\n                         2    4     6    8    10    12    14    16    18    20                                              60                                                                                         0                                                                                              0\n                                                                                                                                                                                                                            0     2    4    6    8     10     12    14    16         18           20                       0     2    4    6    8     10     12    14    16         18           20\n                                    Output Neurons                                                                                2    104    6 20 8         30\n                                                                                                                                                            10     12     40\n                                                                                                                                                                          14    16 50 18    60\n                                                                                                                                                                                            20\n                                                                                                                                                        Output Neurons\n                                                                                                                                              Output Neurons                                                                                          Time                                                                                           Time\n\n\n\n\nFigure 2: Comparison between full inference using hidden Markov model and approxima-\ntion using network model. Top Row: Full Inference (A,C) and approximation (B,D) results\nfrom two walks. Input spikes (RT ) are shown as green circles; output spikes (ST > .9) as\nmagenta stars; true stimulus as dashed line; mean inferred trajectory as red line; mean ap-\nproximated trajectory as yellow line; distributions P (X(t)|RT ) and Q(X(t)|ST ) at each\ntimestep in gray. Bottom Row: Feedfoward (E); recurrent weights (F); variance of exact\nand approximate inference from walks 1 (G) and 2 (H).\n\n\n\n2. Sensorimotor task. We next applied our framework to a recent experiment on proba-\nbilistic computation during sensorimotor processing.7 Here, human subjects tried to move\na cursor on a display to a target by moving a (hidden) finger. The cursor was shown before\nthe start of the movement, it was then hidden, except for one point of blurry visual feedback\nin the middle of the movement (with variances 0 = 0 < L < M <  = ). Unbe-\nknownst to them, on the onset of movement, the cursor was displaced by X, drawn from\na prior distribution P (X). The subjects must estimate X in order to compensate for the\ndisplacement and land the cursor on the target. The key result is that subjects learned and\nused the prior information P (X), and indeed integrated it with the visual information\nin a way that was appropriately sensitive to the amount of blur (figure 3A). The authors\nshowed that a simple Bayesian model could account for these data.\n\nWe model a population representation of the 2D cursor position X(t) on the screen.\nTwo spiking input codes--from vision (R v                                                                                                                                                                                                                                                                                                   p\n                                                                                                                                                                                            T ) and proprioception (RT ), present also\nin the absence of visual feedback--are mapped into an output code ST representing\nP (X(t)|R v                                                     p\n                                   T , RT ). This is a neural instantiation of Bayesian cue combination, and also\nan extension of the previous model to the dynamic case.\n\nThe first experiment involved a Gaussian prior distribution: P (X)  N (1, .5). During\ninitial experience subjects learn about this prior from trajectories; we determine the param-\neters of the HMM. We use BPTT to learn feedforward weights for the input spikes from\nthe different modalities, and recurrent weights between the output units. The input popu-\nlation had 64 units per modality, while the state space and output population each had 100\nunits. Input spikes were Poisson based on tuning functions centered on a grid within the\n2D space; spatiotemporal kernels were based on the (gridded) output units j. The model\nwas tested in conditions directly matching the experiment, with the cursor and finger mov-\ning along a straight line trajectory from the current model's estimate of the cursor position,\n< X(t) >Q(X(t)|ST ), to the target location. The model captures the main effects of the\nexperiment (see Figure 3) with respect to visual blur. The prior was ignored when the sen-\nsory evidence was precise (0), it dominated on trials without feedback (), and the two\n\n\f\n                                          1                            1                                         1\n\n\n\n\n                                          0                            0                                         0\n\n\n\n       Final Deviation\n                                         -1                           -1                                        -1\n                          0    1    2     0           1          2     0    1           2                        0          1         2\n                                                           Imposed displacement (X)\n\nFigure 3: (a) Results for a typical subject from the first K ording-Wolpert experiment,7 for\ndifferent degrees of blur in the visual feedback ({0,M,L,}). The average lateral displace-\nment of the cursor from the target location at the trial end, as a function of the imposed\ndisplacement of the cursor from the finger location (X), which is drawn from N (1, .5).\n(b) Model results under the same testing conditions. See text for model details.\n\n\n\n\n\n                                                                                             Final Deviation\n     (a)                                       (b)                               (c)                                  -2    0    2\n\n\nFigure 4: (a) Bimodal prior P (X)  N (2, 0.5) for cursor displacement in second\nKording-Wolpert experiment.7 (b) Results from human subjects. (c) Model results.\n\n\n\nfactors combined on intermediate degrees of blurriness.\n\nIn the second experiment the prior was bimodal (Fig. 4A) and feedback was blurred (L).\nFor this prior, the final cursor location should be based on the more prevalent displace-\nments, so responses based on optimal inference should be non-linear. may modify the\nposterior estimate of cursor location. Finally, its 2DIndeed, this is the case (Fig. 4B;C). In-\ntuitively, the blurry visual feedback inadequately defines the true finger position, and thus\nthe posterior P (X(t)|RT ) is influenced by the learned bimodal prior; the network model\naccurately reconstructs this optimal posterior.\n\nOur model generalizes the simple Bayesian account, and suggests new avenues for predic-\ntions. The dynamic nature of the system permits modeling the integration of several visual\ncues during the trial, as well as differential effects of the timing of visual feedback. The in-\ntegration of cues in our model also allows it to capture interactions between them. Finally,\nits 2D nature allows our system to model other aspects of combining visual and proprio-\nceptive cues, such as their varying and contrasting degrees of certainty across space.14\n\n\n5     DISCUSSION\n\nWe proposed a spiking population coding scheme for representing and propagating uncer-\ntainty which operates at the fine-grained timescale of individual inter-spike intervals. We\nmotivated the key spatio-temporal spike kernels in the model from analytical results in a\n\n\f\nGaussian process, and suggested two approximations to the exact decoding provided by\nthese adaptive spatiotemporal kernels. The first is a regular stationary kernel while the\nsecond is a recurrent network model. We showed how gradient descent can set model pa-\nrameters to match the requirements on the output distribution and capture the dynamics\nunderlying a hidden variable. This is a dynamic and spiking extension of DPC,15 and a\npopulation extension of Deneve.4 We showed its proficiency by comparison with exact\ninference in a random walk, and a neural model that does not use a population code.\n\nThe most important direction concerns biologically plausible learning in the full spiking\nform of the model. One possibility is to view spikes as a primitive action chosen by a\nneuron. In this case, we can use the analog of direct policy methods in partially observable\nMarkov decision processes,2 with faithful tracking of X(t) leading to reward. It is also\npossible that simpler, Hebbian rules will suffice. A second future direction concerns infer-\nence of one variable from another using our spiking population code model. This problem\ninvolves marginalizing over intermediate variables, which is difficult in direct representa-\ntions of distributions over these variables, due to approximating logs of sums with sums of\nlogs;10 we are investigating how well our scheme can approximate this computation.\n\nWe applied the model to a challenging sensorimotor integration task which has been used to\ndemonstrate Bayesian inference. Since it offers a dynamic account, we can make a number\nof predictions about the consequences of variations to the experiment. Most interesting\nwould be the case in which a bimodal likelihood is combined with a unimodal (or bimodal)\nprior (rather than vice-versa), or indeed two instances of visual feedback during the task.\n\nAcknowledgements\nWe thank Sophie Deneve and Jon Pillow for helpful discussions. RZ & RN funded by NSERC, CIHR\nNET program; PD & QH by Gatsby Charitable Fdtn., BIBA consortium, UCL MB/PhD program.\n\nReferences\n\n[1] Anderson C.H. & Van Essen D.C. (1994). Neurobiological computational systems. In: Compu-\n    tational Intelligence: Imitating Life, Zurada, Marks, Robinson (ed.), IEEE Press, 213-222.\n[2] Baxter, J. & Bartlett, P. (2001). Infinite-horizon policy-gradient estimation. JAIR, 319 - 350.\n[3] Carpenter, R. H. S. & Williams, M. L. L. (1995). Neural computation of log likelihood in the\n    control of saccadic eye movements. Nature, 377: 59-62.\n[4] Deneve, S. (2004). Bayesian inference in spiking neurons. NIPS-17.\n[5] Gold, JI & Shadlen, MN (2001). Neural computations that underlie decisions about sensory\n    stimuli. Trends in Cognitive Sciences 5:10-16.\n[6] Hinton, GE & Brown, AD (2000) Spiking Boltzmann machines. NIPS-12: 122-129.\n[7] Kording, KP & Wolpert, D (2004) Bayesian integration in sensorimotor learning. Nature\n    427:244-247.\n[8] Latham, P., Deneve, S., & Pouget, A., (2004). Optimal computation with attractor networks. J\n    Physiology, Paris.\n[9] Pouget, A., Zhang, K, Deneve, S, & Latham, P. (1998) Statistically efficient estimation using\n    population codes. NeuralComputation, 10: 373-401.\n[10] Rao, R. (2004). Bayesian computation in recurrent neural circuits. Neural Computation, 16(1).\n[11] Rieke, F, Warland, D, de Ruyter v. Steveninck, & Bialek, W. (1999). Spikes. MIT Press.\n[12] Sahani, M & Dayan, P (2003) Doubly distributional population codes: Simultaneous represen-\n    tation of uncertainty and multiplicity. Neural Computation 15.\n[13] Saunders, J.A. & Knill, D.C. (2001). Perception of 3D surface orientation from skew symmetry.\n    Vision Research, 41 (24) 3163 - 3183.\n[14] Van Beers, R. J., Sittig, A.C., & Denier, J.J. (1999). Integration of propriocetive and visual\n    position-information J Neurophysiol, 81: 1355-1364.\n[15] Zemel, R.S., Dayan, P. & Pouget A. (1998). Probabilistic interpretation of population codes.\n    Neural Computation, 10, 403-430.\n\n\f\n", "award": [], "sourceid": 2675, "authors": [{"given_name": "Richard", "family_name": "Zemel", "institution": null}, {"given_name": "Rama", "family_name": "Natarajan", "institution": null}, {"given_name": "Peter", "family_name": "Dayan", "institution": null}, {"given_name": "Quentin", "family_name": "Huys", "institution": null}]}