{"title": "Optimal Change-Detection and Spiking Neurons", "book": "Advances in Neural Information Processing Systems", "page_first": 1545, "page_last": 1552, "abstract": null, "full_text": "Optimal Change-Detection and Spiking Neurons\n\nAngela J. Yu CSBMB, Princeton University Princeton, NJ 08540 ajyu@princeton.edu\n\nAbstract\nSurvival in a non-stationary, potentially adversarial environment requires animals to detect sensory changes rapidly yet accurately, two oft competing desiderata. Neurons subserving such detections are faced with the corresponding challenge to discern \"real\" changes in inputs as quickly as possible, while ignoring noisy fluctuations. Mathematically, this is an example of a change-detection problem that is actively researched in the controlled stochastic processes community. In this paper, we utilize sophisticated tools developed in that community to formalize an instantiation of the problem faced by the nervous system, and characterize the Bayes-optimal decision policy under certain assumptions. We will derive from this optimal strategy an information accumulation and decision process that remarkably resembles the dynamics of a leaky integrate-and-fire neuron. This correspondence suggests that neurons are optimized for tracking input changes, and sheds new light on the computational import of intracellular properties such as resting membrane potential, voltage-dependent conductance, and post-spike reset voltage. We also explore the influence that factors such as timing, uncertainty, neuromodulation, and reward should and do have on neuronal dynamics and sensitivity, as the optimal decision strategy depends critically on these factors.\n\n1\n\nIntroduction\n\nAnimals interacting with a changeable, potentially adversarial environment need to excel in the detection of changes in its sensory inputs. This detection, however, is riddled by the inherently competing goals of accuracy and speed. Due to the noisy and incomplete nature of sensory inputs, the animal can generally achieve more accurate detection by waiting for more sensory inputs. However, gathering this extra information incurs an opportunity cost, as the extra time can be used to gather more food, attract a mate, or escape a predator. Neurons subserving the detection process face a similar speed-accuracy trade-off. In this work, we aim to understand the computations performed by a neuron at the time-scale of single spikes. How sensitive a neuron is to each input spike should depend on the relative probabilities of the input representing noise and useful information, and the relative costs of mis-interpretation. We formulate the problem as an example of change-detection, and characterize the optimal decision policy in this context. The formal tools we utilize to formalize the change-detection problem are built upon work in the area of controlled stochastic processes. Controlled stochastic processes refer to decision-making in environments plagued not only by inferential uncertainty about the state of the world, but also uncertainty associated with the consequences of an action or decision on the world itself. Finding optimal decision policies for such processes is an actively researched problem in financial mathematics and operations research. As we will discuss below, neuronal change-detection is a prime example of such a problem. In Sec. 2, we introduce the general framework of change-detection. In Sec. 3, we apply the framework to a specific scenario similar to that faced by the neuron, and characterize the optimal solution. In Sec. 3, we demonstrate that the optimal information accumulation and decision process has dynamics remarkably resembling that of a spiking neuron. We examine the computational import of certain intracellular properties, characterize the input-output firing rate relationship, and extend the framework of multi-source detection. In Sec. 4, we explore the behavioral consequences of opti-\n\n\f\nmal change-detection and examine issues such as the speed-accuracy trade-off, temporal and spatial cueing, and neuromodulation.\n\n2\n\nA Bayesian Formulation of the Change-Detection Problem\n\nThe Generative Model Suppose we have sequential inputs x1 , x2 , . . ., which are generated iid by a distribution f0 (x) before time   {0, 1, . . .}, and by a distribution f1 (x) afterwards, where the random variable (r.v.)  denotes the sudden, hidden change time.  has an initial probability P ( = 0) = q0 , and a geometric distribution thereafter: P ( = t) = (1 - q0 )(1 - q )t-1 q , t > 0. The change-detection problem is concerned with finding the optimal decision policy for reporting the change from f0 to f1 as early as possible while minimizing false-alarms [1]. A decision policy  is a mapping, possibly stochastic, from all observations made so far to the control (or action) set,  (xt {x1 , . . . , xt })  {a1 , a2 }. The action a1 terminates the observation process and reports   t, and a2 continues the observation for another time step. Every unique decision policy is identified by a corresponding r.v. of stopping times   {0, 1, . . .}. In the following, we will use  and  interchangeably to refer to a policy. The Loss Function Following convention [2], we assume a loss function linear in false alarms and detection delay: l (,  ) = 1{ <;} + 1{ ;} c( - ) (1) where 1 is the indicator function, and c > 0 is a constant that specifies the relative importance of speed and accuracy. The total loss is the expectation of this loss function over  and  : =   =   -1   c( - )P (,  ) P ( < ) + c ( - )+ (2) P (,  ) + L l (,  );  =\n=0 =0 = \n\nAn optimal policy  minimizes L . Due to the linear loss in detection delay, the expected loss blows up for all policies that do not stop almost surely (a.s.; probability=1) in finite time; therefore, we restrict the optimization problem in the following to the class of almost-surely finite-time policies. Using the notation Pt P (  t|xt ), we have the following: P t t P ( >  ) = P ( = t,  >  ) = ( >  |x )P ( = t|xt )p(xt )dxt = 1-P  ,x\n=0 =0\n\n(-)+ =\n\nThe cumulative posterior probability P at the detection time  , therefore, is the critical factor in loss evaluation and policy optimization:\n -1 L = ck=0 Pk + (1 - P  ) ,Pk , ; .\n\nt\n\n1{ >t} 1{t} , =\n\n=0\n\nt   -1\n=0 =0\n\nP ( )P (  t) =\n\n\n\nP ( )\n\n=0\n\nt-1 t =0\n\nt -1 Pt ,xt =  Pt ,xt ,\n=0\n\n(3) P (  t|xt ), P0 = q 0 . (4)\n\nBayes Rule gives us the iterative update rule for the cumulative posterior Pt Pt+1 = (Pt + (1 - Pt )q )f1 (xt+1 ) , (Pt + (1 - Pt )q )f1 (xt+1 ) + (1 - Pt )(1 - q )f0 (xt+1 )\n\nPt+1 is a deterministic function of Pt and xt+1 , but appears to take a stochastic trajectory since xt+1 is an i.i.d.-distributed r.v. The expectation of Pt+1 |xt is Pt + (1 - Pt )q . We also define the P monotonically related posterior ratio t = 1-t t , which has the update rule P t+1 = f1 (xt+1 )(t + q ) , f0 (xt+1 )(1 - q ) 0 = q . 1-q (5)\n\nOptimal Policy: Threshold Crossing In order to optimize over the space of all possible stopping rules (policies), we define the following: (1) the conditional termination cost, Ct , associated with stopping at time t after observing xt : Ct c t-1 i=0 Pi + (1 - Pt ); (2) the minimal conditional cost, t , to be expected after observation xt : t ess inf C |xt , where  ranges over all stopping rules that terminate no earlier than t, and\n\n\f\nthe expectation is taken over all future observations (which can be a function of the decision taken at every time step); (3) ess inf, the largest (a.s.) r.v. less than (a.s.) every r.v. Xn , n  N . As an example of Bellman's Equation, t satisfies the dynamic programming equation t = min{Ct , t+1 |xt }, and that the stationary, deterministic stopping rule   = min{t  1|t = Ct } achieves optimality (Eq. 2). This implies that the optimal policy consists of a stopping region S  [0, 1] and a continuation region C = [0, 1] \\ S , such that  (Pt : Pt  S ) = a1 and  (Pt : Pt  C ) = a2 . We will state and prove a useful theorem below, which will imply that C and S neatly fall into two contiguous blocks, such that the optimal policy requires the termination action as soon as Pt exceeds some fixed threshold B   ie the optimal policy is a first-passage process in Pt ! Before we present the theorem, we first introduce the method of truncation. The difficulty of solving the dynamic equation for t lies in its infinite recursiveness. If we can impose a finite horizon T T T on  , then the finitely recursive relation t = min {Ct , t+1 |xt } has a corresponding finite T  T horizon optimal policy T , where T = CT . Taking the infinite limit t limT  t , it has been shown [2] that when the expected loss is finite (which is the case here, since the expression in Eq. 2   is finite for all decision policies that stop a.s. in finite time), t = t , and T converges to the  infinite-horizon optimal policy  . We also note the following self-evident lemma. i gi (t)wi (t), Lemma. i Suppose {gi (t)}iI is a family of decreasing functions in t, and h(t) = wi (t) = 1 t. If gi (t)  gj (t) implies wi (t)  wj (t), then h(t) decreases with t. where\nT Theorem. Ct - t+1 |xt is a decreasing function of Pt . T Proof. CT -1 - T |xt-1 decreases with PT -1 . Assume that the theorem holds for t + 1, and note: T Ct - t+1 |xt = -(c + q )Pt + q +\n\ni\n\ngi wi\n\nT where gi max(0, li ), li Ct+1 - t+2 |xt , xt+1 = i , and wi P (xt+1 = i|x). gi decreases with Pt for each i, since li decreases with Pt+1 by the inductive hypothesis, and Pt+1 increases with Pt by Eq. 4. Suppose i, j are such that f1 (i) - f0 (i) > f1 (j ) - f0 (j ), then t+1 (i) > t+1 (j ), and Pt+1 (i) > Pt+1 (j ), for any given xt . The inductive hypothesis implies gi  gj . Also note T dwk /dPt = (f1 (k )-f0 (k ))(1-q ), so dwi /dPt  dwj /dPt . Thus, Ct - t+1 |xt decreases with Pt .\n\nThis theorem states that the cost of stopping at time t relative to continuing gets smaller when it is more certain that   t. This is true for any finite stopping time T and therefore also for the infinitehorizon limit. If Ct - t+1 |xt is negative for some value of Pt , then the optimal policy is to select action a1 ; this is also true for any larger values of Pt . Define B   [0, 1] as the lower bound of all such Pt , then the stopping and continuation regions have the form [B  , 1] and [0, B  ), respectively. Ideally, we would like to have an exact solution for the optimal policy as a function of the generative and cost parameters of the change-detection problem as defined in Sec. 1. While the explicit form of B  is not known, the theorem allows us to find the optimal policy numerically by evaluating and minimizing the empirical loss as a function of the decision threshold B  [0, 1].\n\n3\n\nNeuronal change-detection\n\nIn the following, we focus on the specific case where f0 and f1 are Bernoulli processes with respective rate parameters 0 and 1 . This case resembles the problem faced by neurons, which receive sequential binary inputs (spike=1, no spike=0) with approximately Poisson statistics. The Bernoulli process is a discrete-time analog of the Poisson process, and obviates the problematic assumption (made by the Poisson model) that spikes could be fired infinitely close to one another. For now, we assume that the generative parameters 1 , 0 , q0 , q and the cost parameter c are known. We also assume, without loss of generality, that 1 > 0 (rate increases), since otherwise we can just redefine the inputs (0 or 1). When the parameters satisfy c  (1 - 0 - q (1 - 0 ))/(1 - 1 ), we have the explicit solution B  = q /(q = c), or   q /c (proof omitted). This corresponds to the one-step look-ahead policy, and is optimal when the cost of detection is large or when the probability of the change taking place is very high. This turns out not to be a very interesting case as the detection process is driven to cross the threshold even in the absence of any input spikes.\n\n\f\nAlthough we do not have an explicit solution for the optimal detection threshold B  in general, we can numerically compare different values of B for any specific problem. Fig. 1(a) shows the empirical cost averaged over 1000 trials for different threshold values. For these particular parameters, the minimum is around B = 0.65, although the cost function is quite shallow for a large range of values of B around the optimum, implying that performance is not particularly sensitive to relatively large perturbation around the optimal value. Repeated Change-Detection and Firing Rate From the problem formulation in sec. 2, it might seem like the framework only applies to detecting a single change, or multiple unrelated changes. However, the same policy formulation can apply to the case of repeated detection of changes, one after another, in a temporally contiguous fashion. As long as each detection event is generated from the same model parameters (q , q0 , f1 , f0 ), and the cost parameter (c) remains constant, the threshold-crossing policy is still optimal in minimizing the empirical expected loss over these repeated events. The only generative parameter affected by the repetition is q0 , which represents the probability of the inputs already being generated from f1 before the current observation process began. In this repeated detection scenario, q0 should in general be high if the detection threshold B  is high, and low if B  is low. However, the strength of this coupling is tempered by (i) whether each detection termination resets the generative process, as happens when visual detection leads to saccades and thus the resetting of input statistics, and (ii) the amount of time elapsed during the refractory period after a detection spike. Fortunately, while q0 is influenced by the detection policy, the optimization of the policy is not influenced by q0 , since it consists of comparing Ct and t+1 |xt at every time step. This comparison does not depend on q0 , which simply adds a linear factor to both terms. In this repeated firing scenario, where the number of spikes is relatively high relative to the frequency of changes, the loss function of Eq. 2 can be rewritten as L = p0 r0 + c/r1 , where ri is the mean firing rate when the inputs are generated from fi , and p0 is the fraction of time when f0 is applicable (as opposed to f1 ). In other words, if the rate of change is slow compared to neuronal firing rates, then optimal processing amounts to minimizing the \"spontaneous\" firing rate during f0 and maximizing the \"stimulus-evoked\" firing rate during f1 . Optimality and Dynamics of Leaky Integrate-and-Fire Fig. 1(b) illustrates this concept of repeated firing. The top panel shows an example tracing of the dynamical variable t in the repeated optimal change-detection process. Whenever t reaches the threshold 0.65/(1-0.65) (or equivalently when Pt reaches 0.65, the optimal threshold as determined in the last section), a change is reported and the whole process resets to 0 . The dynamics of t is remarkably similar to a leaky integrate-and-fire neuron. The bottom panel shows a raster plot of input and output spikes over 25 trials, and again the resemblance to spiking neurons is remarkable. Closer inspection indicates that the update rule for the posterior ratio in Eq. 5 indeed approximates the dynamics of a leaky integrate-and-fire neuron [3]. Let a (1-q)f0 (xt ) , we can rewrite Eq. 5 as f1 (xt ) t = a(t-1 + q )\n1 (1-q )0\n\n(6)\n\nWhen xt = 1, a = > 1, t increases, and the rate of increase is larger when t itself is larger. This is reminiscent of the near-threshold dynamics of the Hodgkin-Huxley model, in which the voltage-dependent activation of sodium conductance drives the neuron to fire [4]. When xt = 0, t converges to 0 = f1 q /(f0 (1 - q ) - f1 ) (by Eq. 5), which is greater than 0 when  f0 (0)/f1 (0)  1 - q . We can think of 0 as the resting membrane potential. Since 0 increases   with q , it implies that the resting potential should be higher and closer to the firing threshold, making the neuron more sensitive to synaptic inputs, when there is a stronger expectation that a change is imminent. Relationship Between Input and Output Firing Rates We can also look at the input-output relationship at the firing-rate level. The state-dependent rate parameter a has the expected values: 1 2 + 0 - 20 1 1 1 a1 a|f1 = . a0 a|f0 = 1-q 1-q 0 - 2 0 Given Eqs. 5 and 6, we can write down an approximate, explicit expression for t |fi :  . t k-1 q ai q (1 - at ) i t t k t  ai t |fi  ai ( t-1 +q) = ai 0 + ai q (7) ai = ai 0 + 0+ 1 - ai ai - 1\n=0\n\n\f\n(a)\n\nDynamics of \n1\n\n(b)\nF req u en cy 0.2 0.1 0\n\nDistribution of input spikes\n\n(c)\n\nCost as a function of thresholds 0.8 0.7 0.6 Cost\n\n0.5\n\n0 0\n\n100\n\n200\n\n300\n\n400\n\n500\n\n-200\n\n-100\n\n0\n\n100\n\n200\n\n0.5 0.4 0.3 0.2\n\nInput and output spikes\n0 10 20 0 100 200 300 Time (samples) 400 500\nF r e q u e n cy 0.04 0.02 0\n\nDistribution of output spikes\n\n-200\n\n-100\n\n0 Time\n\n100\n\n200\n\n0.1 0.6\n\n0.65\n\n0.7\n\n0.75 0.8 Threshold\n\n0.85\n\n0.9\n\nFigure 1: Optimal change-detection and dynamics. (a) The empirical average cost (over 1000 trials) has a single shallow minimum at B = 0.65. 0 = 0.13, 1 = 0.17, q = 0.0125, q0 = 0.05, c = 0.0005; these parameters apply for the remainder of the paper unless otherwise specified. (b) Top panel: a typical example of the dynamics of t over time. Superimposed on t are the spikes, which are arbitrarily set to a fixed high value. Black bars near the bottom indicate input spikes. Green line indicates time of actual change. In this example, a chance flurry of input spikes near the start causes the optimal change-detector to fire; after the change, the increased input firing rate induces the change-detector fire much more frequently. Note that t decreases whenever there is a lull in input spikes. Bottom panel: Raster plot of input (blue) and output (red) spikes; both more frequent after the the change indicated by the green line. (c) Output spikes (bottom) increase frequency quickly after the increase in input spikes (top). Given the decision threshold B , t0 |f0 = t1 |f0 = B , where ti is the average number of time steps it takes to reach the threshold for for xt = fi , and can be assumed to be 1 (it takes many time steps of input integration to reach the threshold). We therefore have 1 = =  q  /(a0 - 1) + 0 t1 q q t /t t /t  a00 1 . (8) at1  a1 = a00 1 + at0 + 1 0 1 0 a0 - 1 a1 - 1 q /(a1 - 1) + 0 And therefore the ratio of the output firing rates, ri log t0 log a1 r1 = = = r0 t1 log a0\n1 1-q\n\n1/ti for i = 1, 2, is =1+ log\n2+0-20 1 1 0-2 0 1 log 1-q\n\n+ log log\n\n2+0-20 1 1 0-2 0\n\n1 1-q\n\n.\n\n(9)\n\nSince the arguments of log in both the denominator and numerator are greater than 1, r1 /r0 > 1. Therefore, when the input rates are such that 1 > 0 , then the respective output rates are also such that r1 > r0 . To see exactly how the output firing rate ratio changes as a function of the input rates, 2 + -2 1 we define the function g (0 , 1 ) 1 0-20 , and take its partial derivatives with respect to 0 0 0 and 1 . Then we see that the output firing ratio Eq. 9 also increases with 1 and decreases with 0 , consistent with intuitions. Fig. 1(c) shows the average detection/firing rate over time: the rise in output firing rate closely follows that in the input, despite the small change in the input firing rates. Multi-source change-detection So far, we have only considered the case of the Bernoulli inputs uniformly changing from one rate to another. However, sometimes the problem at hand is one of multi-source change-detection. For instance, a visual neuron detecting the onset of a stimulus might get inputs from up-stream neurons sensitive to stimuli with different properties (different colors, orientations, depth of view, etc.). Here, we extend our framework to the case of two independent sources of inputs, using an approach similar to that taken in [5]. The source f i , i  {1, 2} emits observations xi , xi , . . . from a 1 2 Bernoulli process that changes from rate i to i at an unknown time i , where i is generated by 0 1 i a geometric distribution with parameter q i , and the prior probability P (i = 0) is q0 . The objective is to detect  min(1 , 2 ) with the cost function specified as before (Eqs. 1-2). Defining the individual posteriors Pti Pt P (i  t|xi ), where xi t t xi , . . . , xi , we have the following t 1 (10)\n\nP (min(1 , 2 )  t|x1 , x2 ) = 1 - (1 - Pt1 )(1 - Pt2 ) = Pt1 + Pt2 - Pt1 Pt2 . t t Pt /(1 - Pt ) = 1 + 2 + 1 2 t t tt\n\nWe can also define the corresponding overall posterior ratio t (11)\n\n\f\n(a)\n\n(b)\n\n(c)\n\nFigure 2: Effect of cueing on change-detection. (a) Distribution of first spikes for the optimal stopping policy; spikes aligned to time 0 when the actual change takes place. (b) This distribution is significantly tightened with mean brought closer to the actual change, when there is extra temporal information about an imminent change (q = .02). (c) The distribution of spikes is also slightly tightened and brought closer to the actual change time, when there is stronger prior probability of a stimulus appearing (q0 = .1), as during special cueing. The effect is smaller because the higher prior leads to false alarms as well as reducing detection delay. as a function of the individual posterior ratios i Pti /(1 - Pti ). Following reasoning very close t to that of Sec. 2, we can show that if the generative and cost parameters are such that t is lowerbounded by 0 for t 1, then the optimal stopping/detection policy is to terminate at the smallest  t, such that t = 1 + 2 + 1 2  (q 1 + q 2 - q 1 q 2 )/c. Despite the generative independence of t t tt the two Bernoulli processes, we note that the optimal policy is different from the nave strategy of i running two single-source change-detectors, and report a change as soon as one of them reports a change. To see this, consider the case when 1 = q 1 /c, but 2  0, so that t  1 = q 1 /c < t t t (q 1 + q 2 (1 - q 1 ))/c. Therefore, the individual detector for process 1 would have reported a change, but the overall detector would not.\n\n4\n\nOptimal Change-Detection and Neuromodulation\n\nA sizeable body of behavioral studies suggest that stimulus processing is influenced by cognitive factors such as knowledge about the timing of stimulus onset, or whether or not a stimulus would appear in a particular location. There is evidence that the neuromodulators norepinephrine [6], and acetylcholine [7] are respectively involved in those two aspects of stimulus processing. Separately, there is a rich literature on the effects of these various neuromodulators at the single-cell level [8]. Since we have here an explicit model of neuronal dynamics as a function of the statistical properties associated with the stimulus, we are ideally positioned to examine how these properties should affect the cellular properties, and whether the known behavioral consequences of neuromodulation are consistent with their observed effects at the cellular level. If the system has some prior knowledge about the onset time of a stimulus, we can model the information accumulation process as starting shortly before the mean change-time, with a tight distribution over the random variable . Making q larger achieves both effects in our model. Fig. 2A shows the distribution of first spikes for 1000 trials; Fig. 2B shows that this distribution is more tightly clustered immediately after the actual change time  for larger q . Experimentally, it has been observed that norepinephrine makes sensory neurons fire more vigorously to bottom-up sensory inputs [8]. It is also known from behavioral studies that a temporal cue improves detection performance, and that noradrenergic depletion diminishes this advantage [6]. If there is some prior knowledge about the stimulus being in a particular location, we can model this with a higher prior probability q0 of the stimulus being present. This also has the effect of increasing the responsiveness of the change-detection process to input spikes (Fig. 2C), as well as making the detection (spiking) process more sensitive. It has been shown experimentally that a (correct) spatial cue improves stimulus detection, and that acetylcholine is implicated in this process [7], and that acetylcholine potentiates neurons and increases their responsiveness to sensory inputs [8].\n\n5\n\nDiscussion\n\nResponding accurately and rapidly to changes in the environment is a problem confronted by the brain at every level, from single neurons to behavior. In this work, we have presented a formal treatment of the change-detection problem and obtained important properties of the optimal policy  for a broad class of problems, the optimal detection algorithm is a threshold-crossing process based on the posterior probability of the change having taken place, which can be iteratively updated using Bayes'\n\n\f\nRule. Applying these ideas to the case of neurons that must rapidly and accurately detect changes in input spike statistics, we saw that the optimal algorithm yields dynamics remarkably similar to the intracellular dynamics of spiking neurons. This suggests that neurons are optimized for tracking discrete, abrupt changes in the inputs. The model yields insight into the computational import of cellular properties such as resting membrane potential, post-spike reset potential, voltage-dependent conductances, and the input-output spiking relationship. The basic framework was extended to examine the case of multi-source change-detection, a problem faced by a neuron tasked with detecting a stimulus when it could be one of two possible sub-categories. We also explored the computational consequences of spatial and temporal cueing on stimulus detection, and saw that the behavioral and biophysical effects of neuromodulation (eg by acetylcholine and norepinephrine) are consistent within the framework. This novel framework for modeling single-neuron computations is attractive, as it suggests explicit design principles underlying neuronal dynamics, and not merely provides a descriptive model. Since the computational objects are well-specified at the outset, it provides a natural theoretica link between cellular properties and behavioral constraints. It is also appealing as a self-consistent and elegantly simple model of the computations taking place in single neurons. Every neuron in this scheme simply detects changes in its synaptic inputs, on a spike-to-spike time scale, and propagates its knowledge according to its own speed-accuracy trade-off. All that a down-stream neuron needs from this neuron for its own change-detection computations are this neuron's average firing rate in different states, the rate of change among these states, and the prior probability of of this neuron being in one of those states  all of these quantities can be learned over a longer time-scale. In particular, the down-stream neuron does not need to know about this neuron's inputs, its internal dynamics, its decision policy, its objective function, its model of the world, etc. In this scheme, more sophisticated computations can be achieved by pooling together the outputs of different neurons in various configurations  we explored this briefly with the example of multi-source change-detection. Another advantage of this framework is that it eliminates the boundary between inference and decision. In this scheme, neurons make inferences about their inputs and make decisions at every level of processing. It therefore obviates the problem of where in a hierarchical nervous system does the nature of the computation change from input-processing to decision-making. While the incorporation of formal tools from controlled stochastic processes into the modeling of single-cell computations is a novel approach, this work is related to several other theoretical works. The idea of neurons processing and representing probabilistic information has received much attention in recent years, with most work focusing on the level of neuronal populations [912]. Theoretical work on the representation and processing of probabilistic information in single neurons are comparatively more rare. It has been suggested [13] that certain decision-making neurons may accumulate probabilistic information and spike when the evidence exceeds a certain threshold. However, it was typically assumed that the neurons already receive continuously-valued inputs that represent probabilistic information. Moreover, the tasks considered in these earlier works involved stationary discrimination, such that there was no explicit non-stationarity in the state of the world/inputs. We note that our framework is a generalization of the commonly studied 2AFC task, which is equivalent to setting the change probability q to 0 in our model. Consistent with this characterization, our optimal policy is a generalization of the SPRT algorithm which is known to be optimal for stationary 2AFC discrimination [14]. One closely related piece of work proposed that single neurons track the log posterior ratio of the state of an underlying binary variable, and spike when the new inputs imply a value for this log posterior ratio that is sufficiently different from the neuron's current estimate based on previous inputs [15]. The key difference at the conceptual level is that this previous work focused on the explicit propagation of probabilistic information across neurons, thus introducing complications into processing and learning that are necessary to make this probabilistic knowledge consistent across neurons. Also, there was no explicit analysis of the optimality of the output spike generation process: how much of a discrepancy merits a spike, and how this depends on the relevant statistical and cost parameters. At the mechanistic level, having the membrane potential represent the log posterior ratio, as opposed to the posterior ratio, requires the dynamical update rule to involve exponentiation. While it was shown in that work that the dynamics is approximately leaky integrate-and-fire during steady state, it does not help the most interesting case, when the world is rapidly changing and the linear approximation is most detrimental. We showed in this work that there are good reasons for neurons not to integrate inputs linearly. The amount of new evidence provided by each input (spike\n\n\f\nor not spike) at every time step is state-dependent, and should be so according to optimal information integration. This work suggests that the particular types of nonlinearity we see in neuronal dynamics are desirable from a computational point of view. One important assumption we made in our model is that the cost of detection delay is linear in time, parameterized by the constant c. Without this assumption, the controlled dynamic process framework would not apply, as the decision policy would not only depend on a state variable, but on time in an explicit way. However, in general, there might not be a fixed c that relates the trade-off between false alarms and detection delay. Intuitively, c should be related to how much reward could be obtained per unit of time if the system were not engaged in prolonging the current observation process. In particular, if a new \"trial\" begins as soon as the current \"trial\" terminates, regardless of detection accuracy, then c should be set to P (   )/  , which also places the two cost terms in the same dimension. If we had analytical expressions for P (   ) and  as a function of the decision threshold B , then we could solve the optimization problem through the self-consistency constraint placed on the optimal threshold B  through its dependence on c. Unfortunately, there is no known analytical expressions for P (   ; B ) and  ; B . Alternatively, one might still numerically obtain a value for a fixed detection threshold that incurs the lowest cost among all thresholds. There is no guarantee, however, that the optimal policy lives in this parameterized family of policies. It may be that the best fixed threshold policy is still far from optimal detection. There are several important and exciting directions in which we plan to extend the current work. One is the consideration of more complex state transitions. In this work, we assumed that the state transition is always from f0 to f1 . But in more general scenarios, the inputs are likely to revert back to f0 before another transition into f1 , and so on. Thus, we need at least two populations of detectors, one that detects the onset (f0 to f1 ), and one that detects the offset (f1 to f0 ). Intuitively, there ought to be recurrent connections between them, to propagate and aggregate the total information about what states the inputs are in. A related problem is when the inputs can be in multiple (> 2) possible states, or even a continuous range of states, with complex transitions among these states. Another interesting question is what happens when we have a different or more complex distribution for the change variable . We know, for instance, that animals are capable of utilizing independent temporal information about the mean and variance of the stimulus onset. In the geometric model we assumed, these two variables are coupled. Finally, we note that the formal framework we presented, that of optimal detection of changes in input statistics, is not only applicable to the level of single neuron, but also to systems and cognitive level problems. For example, certain problems in reinforcement learning, such as reversal learning and exploration versus exploitation in general, are also amenable to analysis by a similar approach. We intend to explore some of these problems in the future using similar formal tools from controlled dynamic processes. Acknowledgments We thank Bill Bialek, Peter Dayan, Savas Dayanik, and Sophie Deneve for helpful discussions.\n\nReferences\n[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] Shiryaev, A N (1978). Optimal Stopping Rules, Springer-Verlag, New York. Chow, Y S et al (1971). Great Expectations: The Theory of Optimal Stopping, Houghton Mifflin, Boston. Dayan, P & Abbott, L F (2001). Theoretical Neuroscience, MIT Press, Boston. Hodgkin, A L & Huxley, A F (1952). J. Physiology 117: 500-44. Bayraktar, E & Poor, H V (2005). 44th IEEE Conf. on Decision and Control and Eur. Control Conference. Witte, E A & Marrocco, R T (1997). Psychopharmacology 132: 315-23. Phillips, J M, McAlonan, K, Robb, W G K & Brown, V (2000). Psychopharmacology 150: 112-6. Gu, Q (2002). Neuroscience 111: 815-35. Zemel, R S, Dayan, P, & Pouget, A (1998). Neural Computation 10: 403-30. Sahani, M & Dayan, P (2003). Neural Computation 15: 2255-79. Rao, R P (2004). Neural Computation 16: 1-38. Yu, A J & Dayan, P (2005). Advances in Neural Information Processing Systems 17. Gold, J I & Shadlen, M N (2002). Neuron 36: 299-308. Wald, A & Wolfowitz, J (1948). Ann. Math. Statisti. 19: 326-39. Deneve, S (2004). Advances in Neural Information Processing Systems 16.\n\n\f\n", "award": [], "sourceid": 3006, "authors": [{"given_name": "Angela", "family_name": "Yu", "institution": null}]}