Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)

*Sander Bohte, Michael C. Mozer*

Experimental studies have observed synaptic potentiation when a presynaptic neuron fires shortly before a postsynaptic neuron, and synaptic depression when the presynaptic neuron fires shortly af- ter. The dependence of synaptic modulation on the precise tim- ing of the two action potentials is known as spike-timing depen- dent plasticity or STDP. We derive STDP from a simple compu- tational principle: synapses adapt so as to minimize the postsy- naptic neuron's variability to a given presynaptic input, causing the neuron's output to become more reliable in the face of noise. Using an entropy-minimization objective function and the biophys- ically realistic spike-response model of Gerstner (2001), we simu- late neurophysiological experiments and obtain the characteristic STDP curve along with other phenomena including the reduction in synaptic plasticity as synaptic efficacy increases. We compare our account to other efforts to derive STDP from computational princi- ples, and argue that our account provides the most comprehensive coverage of the phenomena. Thus, reliability of neural response in the face of noise may be a key goal of cortical adaptation.

1 Introduction

Experimental studies have observed synaptic potentiation when a presynaptic neu- ron fires shortly before a postsynaptic neuron, and synaptic depression when the presynaptic neuron fires shortly after. The dependence of synaptic modulation on the precise timing of the two action potentials, known as spike-timing dependent plasticity or STDP, is depicted in Figure 1. Typically, plasticity is observed only when the presynaptic and postsynaptic spikes (hereafter, pre and post) occur within a 2030 ms time window, and the transition from potentiation to depression is very rapid. Another important observation is that synaptic plasticity decreases with in- creased synaptic efficacy. The effects are long lasting, and are therefore referred to as long-term potentiation (LTP) and depression (LTD). For detailed reviews of the evidence for STDP, see [1, 2]. Because these intriguing findings appear to describe a fundamental learning mech- anism in the brain, a flurry of models have been developed that focus on different aspects of STDP, from biochemical models that explain the underlying mechanisms giving rise to STDP [3], to models that explore the consequences of a STDP-like learning rules in an ensemble of spiking neurons [4, 5, 6, 7], to models that pro- pose fundamental computational justifications for STDP. Most commonly, STDP

Figure 1: (a) Measuring STDP experimentally: pre-post spike pairs are repeatedly in- duced at a fixed interval tpre-post, and the resulting change to the strength of the synapse is assessed; (b) change in synaptic strength after repeated spike pairing as a function of the difference in time between the pre and post spikes (data from Zhang et al., 1998). We have superimposed an exponential fit of LTP and LTD.

is viewed as a type of asymmetric Hebbian learning with a temporal dimension. However, this perspective is hardly a fundamental computational rationale, and one would hope that such an intuitively sensible learning rule would emerge from a first-principle computational justification. Several researchers have tried to derive a learning rule yielding STDP from first principles. Rao and Sejnowski [8] show that STDP emerges when a neuron attempts to predict its membrane potential at some time t from the potential at time t - t. However, STDP emerges only for a narrow range of t values, and the qualitative nature of the modeling makes it unclear whether a quantitative fit can be obtained. Dayan and H ausser [9] show that STDP can be viewed as an optimal noise-removal filter for certain noise distributions. However, even small variation from these noise distributions yield quite different learning rules, and the noise statistics of biological neurons are unknown. Eisele (private communication) has shown that an STDP-like learning rule can be derived from the goal of maintaining the relevant connections in a network. Chechik [10] is most closely related to the present work. He relates STDP to information theory via maximization of mutual information between input and output spike trains. This approach derives the LTP portion of STDP, but fails to yield the LTD portion. The computational approach of Chechik (as well as Dayan and H ausser) is premised on a rate-coding neuron model that disregards the relative timing of spikes. It seems quite odd to argue for STDP using rate codes: if spike timing is irrelevant to information transmission, then STDP is likely an artifact and is not central to understanding mechanisms of neural computation. Further, as noted in [9], because STDP is not quite additive in the case of multiple input or output spikes that are near in time [11], one should consider interpretations that are based on individual spikes, not aggregates over spike trains. Here, we present an alternative computational motivation for STDP. We conjecture that a fundamental objective of cortical computation is to achieve reliable neural re- sponses, that is, neurons should produce the identical response--both in the number and timing of spikes--given a fixed input spike train. Reliability is an issue if neu- rons are affected by noise influences, because noise leads to variability in a neuron's dynamics and therefore in its response. Minimizing this variability will reduce the effect of noise and will therefore increase the informativeness of the neuron's output signal. The source of the noise is not important; it could be intrinsic to a neuron (e.g., a noisy threshold) or it could originate in unmodeled external sources causing fluctuations in the membrane potential uncorrelated with a particular input. We are not suggesting that increasing neural reliability is the only learning objective.

If it were, a neuron would do well to give no response regardless of the input. Rather, reliability is but one of many objectives that learning tries to achieve. This form of unsupervised learning must, of course, be complemented by supervised and reinforcement learning that allow an organism to achieve its goals and satisfy drives. We derive STDP from the following computational principle: synapses adapt so as to minimize the entropy of the postsynaptic neuron's output in response to a given presynaptic input. In our simulations, we follow the methodology of neurophysiolog- ical experiments. This approach leads to a detailed fit to key experimental results. We model not only the shape (sign and time course) of the STDP curve, but also the fact that potentiation of a synapse depends on the efficacy of the synapse--it decreases with increased efficacy. In addition to fitting these key STDP phenom- ena, the model allows us to make predictions regarding the relationship between properties of the neuron and the shape of the STDP curve. Before delving into the details of our approach, we attempt to give a basic intu- ition about the approach. Noise in spiking neuron dynamics leads to variability in the number and timing of spikes. Given a particular input, one spike train might be more likely than others, but the output is nondeterministic. By the entropy- minimization principle, adaptation should reduce the likelihood of these other pos- sibilities. To be concrete, consider a particular experimental paradigm. In [12], a pre neuron is identified with a weak synapse to a post neuron, such that the pre is unlikely to cause the post to fire. However, the post can be induced to fire via a second presynaptic connection. In a typical trial, the pre is induced to fire a single spike, and with a variable delay, the post is also induced to fire (typically) a single spike. To increase the likelihood of the observed post response, other response pos- sibilities must be suppressed. With presynaptic input preceding the postsynaptic spike, the most likely alternative response is no output spikes at all. Increasing the synaptic connection weight should then reduce the possibility of this alternative response. With presynaptic input following the postsynaptic spike, the most likely alternative response is a second output spike. Decreasing the synaptic connection weight should reduce the possibility of this alternative response. Because both of these alternatives become less likely as the lag between pre and post spikes is in- creased, one would expect that the magnitude of synaptic plasticity diminishes with the lag, as is observed in the STDP curve. Our approach to reducing response variability given a particular input pattern in- volves computing the gradient of synaptic weights with respect to a differentiable model of spiking neuron behavior. We use the Spike Response Model (SRM) of [13] with a stochastic threshold, where the stochastic threshold models fluctuations of the membrane potential or the threshold outside of experimental control. For the stochastic SRM, the response probability is differentiable with respect to the synap- tic weights, allowing us to calculate the entropy gradient with respect to the weights conditional on the presented input. Learning is presumed to take a gradient step to reduce this conditional entropy. In modeling neurophysiological experiments, we demonstrate that this learning rule yields the typical STDP curve. We can predict the relationship between the exact shape of the STDP curve and physiologically measurable parameters, and we show that our results are robust to the choice of the few free parameters of the model. Two papers in these proceedings are closely related to our work. They also find STDP-like curves when attempting to maximize an information-theoretic measure-- the mutual information between input and output--for a Spike Response Model [14, 15]. Bell & Parra [14] use a deterministic SRM model which does not model the LTD component of STDP properly. The derivation by Toyoizumi et al. [15] is valid only for an essentially constant membrane potential with small fluctuations. Neither of these approaches has succeeded in quantitatively modeling specific experimental

data with neurobiologically-realistic timing parameters, and neither explains the saturation of LTD/LTP with increasing weights as we do. Nonetheless, these models make an interesting contrast to ours by suggesting a computational principle of optimization of information transmission, as contrasted with our principle of neural noise reduction. Perhaps experimental tests can be devised to distinguish between these competing theories.

2 The Stochastic Spike Response Model

The Spike Response Model (SRM), defined by Gerstner [13], is a generic integrate- and-fire model of a spiking neuron that closely corresponds to the behavior of a biological spiking neuron and is characterized in terms of a small set of easily inter- pretable parameters [16]. The standard SRM formulation describes the temporal evolution of the membrane potential based on past neuronal events, specifically as a weighted sum of postsynaptic potentials (PSPs) modulated by reset and thresh- old effects of previous postsynaptic spiking events. Following [13], the membrane potential of cell i at time t, ui(t), is defined as:

```
ui(t) = (t - ^ fi) + wij (t - ^ fi, t - fj), (1) ji fj F t j
```

where i is the set of inputs connected to neuron i, Ft is the set of times prior to j t that neuron j has spiked, ^ fi is the time of the last spike of neuron i, wij is the synaptic weight from neuron j to neuron i, (t - ^ fi, t - fj) is the PSP in neuron i due to an input spike from neuron j at time fj, and (t - ^ fi) is the refractory response due to the postsynaptic spike at time ^ fi. Neuron i fires when the potential ui(t) exceeds a threshold () from below. The postsynaptic potential is modeled as the differential alpha function in [13], defined with respect to two variables: the time since the most recent postsynaptic spike, x, and the time since the presynaptic spike, s: 1 s s (x, s) = exp - - exp - H(s)H(x - s)+ (2) 1 - s m s m s - x x x +exp - exp - - exp - H(x)H(s - x) , s m s where s and m are the rise and decay time-constants of the PSP, and H is the Heaviside function. The refractory reset function is defined to be [13]: x + x (x) = u abs absH(abs - x)H(-x) + uabsexp - + usexp - , (3) r f s r r where uabs is a large negative contribution to the potential to model the absolute refractory period, with duration abs. We smooth this refractory response by a fast decaying exponential with time constant f . The third term in the sum represents r the slow decaying exponential recovery of an elevated threshold, us, with time r constant s. (Graphs of these and functions can be found in [13].) We made r a minor modification to the SRM described in [13] by relaxing the constraint that s = r m; smoothing the absolute refractory function is mentioned in [13] but not explicitly defined as we do here. In all simulations presented, abs = 2ms, s = 4 r m, and f = 0.1 r m. The SRM we just described is deterministic. Gerstner [13] introduces a stochas- tic variant of the SRM (sSRM) by incorporating the notion of a stochastic firing threshold: given membrane potential ui(t), the probability density of the neuron firing at time t is specified by (ui(t)). Herrmann & Gerstner [17] find that then for a realistic escape-rate noise model the firing probability density as a function of the potential is initially small and constant, transitioning to asymptotically linear

increasing around threshold . In our simulations, we use such a function: (v) = (ln[1 + exp(( - v))] - ( - v)), (4) where is the firing threshold in the absence of noise, determines the abruptness of the constant-to-linear probability density transition around , and determines the slope of the increasing part. Experiments with sigmoidal and exponential density functions were found to not qualitatively affect the results.

3 Minimizing Conditional Entropy

We now derive the rule for adjusting the weight from a presynaptic neuron j to a postsynaptic sSRM neuron i, so as to minimize the entropy of i's response given a particular spike sequence from j. A spike sequence is described by the set of all times at which spikes have occurred within some interval between 0 and T , denoted F T for neuron j. We assume the interval is wide enough that spikes outside the j interval do not influence the state of the neuron within the interval (e.g., through threshold reset effects). We can then treat intervals as independent of each other. Let the postsynaptic neuron i produce a response i, where i is the set of all possible responses given the input, FT , and g() is the probability density over i responses. The differential conditional entropy h(i) of neuron i's response is then defined as:

```
h(i) = - g()log g() d. (5) i
```

To minimize the differential conditional entropy by adjusting the neuron's weights, we compute the gradient of the conditional entropy with respect to the weights: h(i) log(g()) = - g() log(g()) + 1 d. (6) wij wij i

For a differentiable neuron model, log(g())/wij can be expressed as follows when neuron i fires once at time ^ fi [18]: log(g()) T (u u (t - ^ fi) - (ui(t)) = i(t)) i(t) dt, (7) wij t=0 ui(t) wij (ui(t)) where (.) is the Dirac delta, and (ui(t)) is the firing probability-density of neuron i at time t. (See [18] for the generalization to multiple postsynaptic spikes.) With the sSRM we can compute the partial derivatives (ui(t))/ui(t) and ui(t)/wij. Given the density function (4), (ui(t)) u = , i(t) = (t - ^ f u i, t - fj ). i(t) 1 + exp(( - ui(t)) wij To perform gradient descent in the conditional entropy, we use the weight update h( w i) ij - (8) wij

```
T (t - ^ fi, t - fj) (t - ^ fi) - (ui(t)) g() log(g()) + 1 dt d. (1 + exp(( - ui(t)))(ui(t)) i t=0 We can use numerical methods to evaluate Equation (8). However, it seems bio- logically unrealistic to suppose a neuron can integrate over all possible responses . This dilemma can be circumvented in two ways. First, the resulting learning rule might be cached in some form through evolution so that the full computation is not necessary (e.g., in an STDP curve). Second, the specific response produced by a neuron on a single trial might be considered to be a sample from the distribution g(), and the integration is performed by a sampling process over repeated trials;
```

Figure 2: (a) Experimental setup of Zhang et al. and (b) their experimental STDP curve (small squares) vs. our model (solid line). Model parameters: s = 1.5ms, m = 12.25ms.

each trial would produce a stochastic gradient step.

4 Simulation Methodology

We model in detail the experiment of Zhang et al. [12] (Figure 2a). In this exper- iment, a post neuron is identified that has two neurons projecting to it, call them the pre and the driver. The pre is subthreshold: it produces depolarization but no spike. The driver is suprathreshold: it induces a spike in the post. Plasticity of the pre-post synapse is measured as a function of the timing between pre and post spikes (tpre-post) by varying the timing between induced spikes in the pre and the driver (tpre-driver). This measurement yields the well-known STDP curve (Figure 1b).1 The experiment imposes several constraints on a simulation: The driver alone causes spiking > 70% of the time, the pre alone causes spiking < 10% of the time, synchronous firing of driver and pre cause LTP if and only if the post fires, and the time constants of the EPSPs--s and m in the sSRM--are in the range of 13ms and 1015ms respectively. These constraints remove many free parameters from our simulation. We do not explicitly model the two input cells; instead, we model the EPSPs they produce. The magnitude of these EPSPs are picked to satisfy the experimental constraints: the driver EPSP alone causes a spike in the post on 77.4% of trials, and the pre EPSP alone causes a spike on fewer than 0.1% of trials. Free parameters of the simulation are and in the spike-probability function ( can be folded into ), and the magnitude (us, u , f , r abs) and reset time constants ( s r r abs). The dependent variable of the simulation is tpre-driver, and we measure the time of the post spike to determine tpre-post. We estimate the weight update for a given tpre-driver using Equation 8, approximating the integral by a summation over all time-discretized output responses consisting of 0, 1, or 2 spikes. Three or more spikes have a probability that is vanishingly small.

Do not remove: This comment is monitored to verify that the site is working properly