{"title": "Plasticity Kernels and Temporal Statistics", "book": "Advances in Neural Information Processing Systems", "page_first": 1303, "page_last": 1310, "abstract": "", "full_text": "Plasticity Kernels and Temporal Statistics \n\nPeter Dayan1 Michael Hausser2 Michael London1\u00b72 \n\n1 GCNU, 2WIBR, Dept of Physiology \nUCL, Gower Street, London \n\ndayan@gats5y.ucl.ac.uk {m.hausser,m.london}@ucl.ac.uk \n\nAbstract \n\nComputational mysteries surround the kernels relating the \nmagnitude and sign of changes in efficacy as a function of \nthe time difference between pre- and post-synaptic activity at \na synapse. One important idea34 is that kernels result from fil(cid:173)\ntering, ie an attempt by synapses to eliminate noise corrupting \nlearning. This idea has hitherto been applied to trace learning \nrules; we apply it to experimentally-defined kernels, using it to \nreverse-engineer assumed signal statistics. We also extend it to \nconsider the additional goal for filtering of weighting learning \naccording to statistical surprise, as in the Z-score transform. \nThis provides a fresh view of observed kernels and can lead to \ndifferent, and more natural, signal statistics. \n\n1 Introduction \n\nSpeculation and data that the rules governing synaptic plasticity should in(cid:173)\nclude a very special role for timel,7,13,17,20,21,23,24,26,27,31,32,3S was spectac-\nularly confirmed by a set of highly influential experiments4\u00b7S,ll,l6,25 show(cid:173)\ning that the precise relative timing of pre-synaptic and post-synaptic ac(cid:173)\ntion potentials governs the magnitude and sign of the resulting plasticity. \nThese experimentally-determined rules (usually called spike-time dependent \nplasticity or STDP rules), which are constantly being refined,18,3o have in(cid:173)\nspired substantial further theoretical work on their modeling and interpreta(cid:173)\ntion.2\u00b79,l0,22\u00b728\u00b729\u00b733 Figure l(Dl-Gl)* depict some of the main STDP findings/ \nof which the best-investigated are shown in figure l(Dl;El), and are variants of \na 'standard' STDP rule. Earlier work considered rate-based rather than spike(cid:173)\nbased temporal rules, and so we adopt the broader term 'time dependent plas(cid:173)\nticity' or TDP. Note the strong temporal asymmetry in both the standard rules. \n\nAlthough the theoretical studies have provided us with excellent tools for mod(cid:173)\neling the detailed consequences of different time-dependent rules, and under(cid:173)\nstanding characteristics such as long-run stability and the relationship with \nnon-temporal learning rules such as BCM,6 specifically computational ideas \nabout TDP are rather thinner on the ground. Two main qualitative notions \nexplored in various of the works cited above are that the temporal asymme(cid:173)\ntries in TDP rules are associated with causality or prediction. However, look(cid:173)\ning specifically at the standard STDP rules, models interested in prediction \n\n*We refer to graphs in this figure by row and column. \n\n\fconcentrate mostly on the L 1P component and have difficulty explaining the \npredsely-timed nature of the LTD. Why should it be particularly detrimental to \nthe weight of a synapse that the pre-synaptic action potential comes just after \na post-synaptic action-potential, rather than 200ms later, for instance?. In the \ncase of time-difference or temporal difference rules, 29\u202232 why might the LTD \ncomponent be so different from the mirror reflection of the L1P component \n(figure 1(\u00a31)), at least short of being tied to some particular biophysical char(cid:173)\nacteristic of the post-synaptic cell. We seek alternative computationally-sound \ninterpretations. \nWallis & Baddeley34 formalized the intuition underlying one class of TDP rules \n(the so-called trace based rules, figure l(A1)) in terms of temporal filtering. In \ntheir model, the actual output is a noisy version of a 'true' underlying signal. \nThey suggested, and showed in an example, that learning proceeds more profi(cid:173)\nciently if the output is filtered by an optimal noise-removal filter (in their case, \na Wiener filter) before entering into the learning rule. This is like using a prior \nover the signal, and performing learning based on the (mean) of the posterior \nover the signal given the observations (ie the output). If objects in the world \nnormally persist for substantial periods, then, under some reasonable assump(cid:173)\ntions about noise, it turns out to be appropriate to apply a low-pass filter to \nthe output. One version of this leads to a trace-like learning rule. \nOf course, as seen in column 1 of figure 1, TDP rules are generally not trace(cid:173)\nlike. Here, we extend the Wallis-Baddeley (WB) treatment to rate-based versions \nof the actual rules shown in the figure. We consider two possibilities, which in(cid:173)\nfer optimal signal models from the rules, based on two different assumptions \nabout their computational role. One continues to regard them as Wiener filters. \nThe other, which is closely related to recent work on adaptation and modula(cid:173)\ntion, 3\u2022 s, 15\u2022 36 has the kernel normalize frequency components according to their \nstandard deviations, as well as removing noise. Under this interpretation, the \nlearning signal is a Z-score-transformed veFsion of the output. \n\nIn section 2, we describe the WB model. In section 3, we extend this model to \nthe case of the observed rules for synaptic plasticity. \n\n2 Filtering \n\nConsider a set of pre-synaptic inputs i E { 1 ... n} with firing rates Xi ( t) at time \nt to a neuron with output rate y ( t). A general TDP plasticity rule suggests that \nsynaptic weight Wi should change according to the correlation between input \nXi(t) and output y(t), through the medium of a temporal filter cf>(s) \n\n~wi ex: J dtxi(t) {J dt'y(t')cf>(t-t')} = J dt'y(t'){J dtxi(t)cp(t-t')} \n\n(1) \n\nProvided the temporal filters for each synapse on a single post-synaptic cell \nare the same, equation 1 indicates that pre-synaptic and post-synaptic filtering \nhave essentially the same effect. \nWB34 consider the case that the output can be decomposed as y(t) = s(t) + \nn(t), where s(t) is a 'true' underlying signal and n(t) is noise corrupting the \nsignal. They suggest defining the filter so that s(t) = f dt' y(t')cf>(t-t') is the \noptimal least-squares estimate of the signal. Thus, learning would be based on \nthe best available information about the signal s ( t). If signal and noise are sta(cid:173)\ntistically stationary signals, with power spectra IS ( w) 12 and IN ( w) 12 respec(cid:173)\ntively at (temporal) frequency w, then the magnitude of the Fourier transform \n\n\fCD \nkernel \n\n0 \nkernel \n\nspectrum \n\n(A]~ \n;\u00a7 \n:o \n...... \n\nt \n\n[[] \nsignal \n\nspectrum \n\n!wiener I \n\n[!] \nsignal \n\nspectrum \n\n!white I \n\n~ \n\nw \n\n\u00a9j \n\nL L klrug~N~ \n\n0 \n\nt \n\nw \n\nw \n\nw \n\nFigure 1: Time-dependent plastidty rules. The rows are for various suggested rules \n(A; 17 B;23 D;25 E;16 F; 2 G, 14 from Abbott & Nelson2); the columns show: (1) the kernels in \ntime t; (2) their temporal power spectra as a function of frequency w; (3) signal power \nS ( w) as function of w assuming the kernels are derived from the underlying Wiener \nfilter; (4) signal power S(w) assuming the kernels are derived from the noise-removal \nand whitening filter. Different kernels also have different phase spectra. See text for \nmore details. The ordinates of the plots have been individually normalized; but the \nabscissce for all the temporal (t) plots and, separately, all the the spectral (w) plots, \nare the same, for the purposes of comparison. Numerical scales are omitted to focus \non structural characteristics. In the text, we refer to individual graphs in this figure by \ntheir row letter (A-G) and column number (1-4). \n\n\fof the (Wiener) filter is \n\nI
t'. \n\nIn the context of input coming from visually presented objects, WB suggest us(cid:173)\ning white noise N ( w) = N, 'if w, and consider two possibilities for S ( w), based \non the assumptions that objects persist for either fixed, or randomly variable, \nlengths of time. We summarize their main result in the first three rows of \nfigure 1. Figure l(A3) shows the assumed, scale-free, magnitude spectrum \nIS(w)l = 1/w for the signal. Figure l(Al) shows the (truly optimal) purely \ncausal version of the filter that results - it can be shown to involve exactly \nan exponential decay, with a rate constant which depends on the level of the \nnoise N. In WB's self-supervised setting, it is rather unclear a priori whether \nthe assumption of white noise is valid; WB's experiments bore it out to a rough \napproximation, and showed that the filter of figure l(Al) worked well on a task \ninvolving digit representation and recognition. \n\nFigure l(Bl;B3) repeat the analysis, with the same signal spectrum, but for \nthe optimal purely acausal filter as used in reinforcement learning's synaptic \neligibility traces. Of course, the true TDP kernels (shown in figure l(Dl-Gl)) \nare neither purely casual nor acausal; figure l(Cl) shows the normal low pass \nfilter that results from assuming phase 0 for all frequency components. \n\nAlthough the WB filter of figure l(Cl) somewhat resembles a Hebbian version \nof the anti-Hebbian rule for layer IV spiny stellate cells shown in figure l(Gl), \nit is clearly not a good match for the standard forms of TDP. One might also \nquestion the relationship between the time constants of the kernels and the \nsignal spectrum that comes from object persistence. The next section consid(cid:173)\ners two alternative possibilities for interpreting TDP kernels. \n\n3 Signalling and Whitening \n\nThe main intent of this paper is to combine WB's idea about the role of filtering \nin synaptic plasticity with the actual forms of the kernels that have been re(cid:173)\nvealed in the experiments. Under two different models for the computational \ngoal of filtering, we work back from the experimental kernels to the implied \nforms of the statistics of the signals. The first method employs WB's Wiener \nfiltering idea. The second method can be seen as using a more stringent defin(cid:173)\ntion of statistical significance. \n\n\f~ phase for E1 ~ phase=-\u00a5 \n\n(g DoG kernel @] DoG signals \n\nb ,, \n\n\\ \n\\ Wiener \n\nw \n\nt \n\nt \n\nw \n\nFigure 2: Kernel manipulation. A) The phase spectrum (ie kernel phase as a function \nof frequency) for the kernel (shown in figure l(El)) with asymmetric LTP and LTD.l 6 \nB) The kernel that results from the power spectrum of figure l(E2) but constant phase \n-rr/2. This kernel has symmetric LTP and LTD, with an intermediate time constant. \nC) Plasticity kernel that is exactly a difference of two Gaussians (DoG; compare fig(cid:173)\nure l(Fl)). White (solid; from equation 4) and Wiener (dashed; from equation 3) signal \nspectra derived from the DoG kernel in (C). Here, the signal spectrum in the case of \nwhitening has been vertically displaced so it is clearer. Both signal spectra show dear \nperiodicities. \n\n3.1 Reverse engineering signals from Wiener filtering \n\nAccepting equation 2 as the form of the filter (note that this implies that \nlci>(w)l ~ 1), and, with WB, making the assumption that the noise is white, \nso IN(w)l = N, Vw, the assumed amplitude spectrum of the signal process \ns(t) is \n\n(3) \nImportantly, the assumed power of the noise does not affect the form of the \nsignal power, it only scales it. \n\nIS(w)l = N~lci>(w)l/(1-lci>(i.o)i). \n\nFigure 1(D2-G2) shows the magnitude of the Fourier transform of the experi(cid:173)\nmental kernels (which are shown in figure 1(D1-G 1)), and figure 1(D3-G3) show \nthe implied signal spectra. Since there is no natural data that specify the abso(cid:173)\nlute scale of the kernels (ie the maximum value of lci>(w) 1), we set it arbitrarily \nto 0.5. Any value less than ~ 0.9 leads to similar predictions for the signal \nspectra. We can relate figure 1(D3-G3) to to the heuristic criteria mentioned \nabove for the signal power spectrum. In two cases (D3;F3), the clear peaks \nin the signal power spectra imply strong periodicities. For layer V pyramids \n(D3), the time constant for the kernel is ~ 20ms, implying a peak frequency of \nw =50Hz in they band. In the hippocampal case, the frequency may be a lit(cid:173)\ntle lower. Certainly, the signal power spectra underlying the different kernels \nhave quite different forms. \n\n3.2 Reverse engineering signals from whitening \n\nWB's suggestion that the underlying signal s(t) should be extracted from the \noutput y ( t) far from exhausts the possibilities for filtering. In particular, there \nhave been various suggestions36 that learning should be licensed by statistical \nsurprise, ie according to how components of the output differ from expecta(cid:173)\ntions. A simple form of this that has gained recent currency is the Z-score \ntransformation,8\u202215\u202236 which implies considering components of the signal in \nunits of (ie normalized by) their standard deviations. Mechanistically, this is \nclosely related to whitening in the face of input noise, but with a rather differ(cid:173)\nent computational rationale. \n\n\fA simple formulation of a noise-sensitive Z-score is Dong & Atick's12 whitening \nfilter. Under the same formulation as WB (equation 2), this suggests multiply(cid:173)\ning the Wiener filter by 1 I IS ( w) I, giving \n\nI(w)l = IS(w)II(IS(w)l 2 + N(w) 2). \n\n(4) \n\nAs in equation 3, it is possible to solve for the signal power spectra implied \nby the various kernels. The 4th column of figure 1 shows the result of doing \nthis for the experimental kernels. In particular, it shows that the clear spectral \npeaks suggested by the Wiener filter (in the 3rd column) may be artefactual -\nthey can arise from a form of whitening. Unlike the case of Wiener filtering, the \nsignal statistics derived from the assumption of whitening have the common \ncharacteristic of monotonically decreasing signal powers as a function of fre(cid:173)\nquency w, which is a common finding for natural scene statistics, for instance. \n\nThe case of the layer V pyramids25 (row Din figure 1) is particularly clear. If \nthe time constants of potentiation (LTP) and depression (LTD) are T, and LTP \nand LTD are matched, then the Fourier transform of the plasticity kernel is \n\n W) = -\n\n( \n\n1 \n\nVEi \n\n( \n\n1 \n\niw + 1. \n\nT \n\n+ \n\n) \n\n1 \n\niw - 1. \nT \n\n-~ w \n\n= -t -\n\nrr w2 + ~ \n\nT \n\n. 2~ tJ \n\n= -tT \n\n-\nrr ~ + T2 \n\nW \n\n( ) \n5 \n\nwhich is exactly the form of equation 4 for S ( w) = 1 I w (which is duly shown \nin figure 1(D4)). Note the factor of -i in (w). This is determined by the \nphases of the frequency components, and comes from the anti-symmetry of \nthe kernel. The phase of the components (L(w) = -rr 12, by one convention) \nimplies the predictive nature of the kernel: Xi(t) is being correlated with led \n(ie future) values of noise-filtered, significance-normalized, outputs. \n\nThe other cases in figure 1 follow in a similar vein. Row E, from cortical layer \nII/ll, with its asymmetry between LTP and LID, has similar signal statistics, \nbut with an extra falloff constant w0, making S(w) = 11(w + w 0 ). Also, it \nhas a phase spectrum L(w) which is not constant with w (see figure 2A). \nRow F, from hippocampal GABAergic cells in culture, has a form that can arise \nfrom an exponentially decreasing signal power and little assumed noise (small \nN ( w) ). Conversely, row G, in cortical layer IV spiny-stellate cells, arises from \nthe same signal statistics, but with a large noise term N(w). Unlike the case \nof the Wiener filter (equation 3), the form of the signal statistics, and not just \ntheir magnitude, depends on the amount of assumed noise. \n\nFigure 2B-C show various aspects of how these results change with the param(cid:173)\neters or forms of the kernels. Figure 2B shows that coupling the power spec(cid:173)\ntrum (of figure 1E2) for the rule with asymmetric LTP and LTD with a constant \nphase spectrum (-rr 12) leads to a rule with the same filtering characteristic, \nbut with symmetric LTP and LTD. The phase spectrum concerns the predictive \nrelationship between pre- and post-synaptic frequency components; it will be \ninteresting to consider the kernels that result from other temporal relation(cid:173)\nships between pre- and post-synaptic activities. Figure 2C shows the kernel \ngenerated as a difference of two Gaussians (DoG). Although this kernel resem(cid:173)\nbles that of figure 1F1, the signal spectra (figure 2D) calculated on the basis of \nwhitening (solid; vertically displaced) or Wiener filtering (dashed) are similar \nto each other, and both involve strong periodicity near the spectral peak of the \nkernel. \n\n\f4 Discussion \n\nTemporal asymmetries in synaptic plasticity have been irresistibly alluring to \ntheoretical treatments. We followed the suggestion that the kernels indicate \nthat learning is not based on simple correlation between pre- and post -synaptic \nactivity, but rather involves filtering in the light of prior information, either to \nremove noise from the signals (Wiener filtering), or to remove noise and boost \ncomponents of the signals according to their statistical significance. \n\nAdopting this view leads to new conclusions about the kernels, for instance re(cid:173)\nvealing how the phase spectrum differentiates rules with symmetric and asym(cid:173)\nmetric potentiation and depression components (compare figures l(El); 2B). \nMaking some further assumptions about the characteristics of the assumed \nnoise, it permits us to reverse engineer the assumed statistics of the signals, ie \nto give a window onto the priors at synapses or cells (columns 3;4 of figure 1). \nStructural features in these signal statistics, such as strong periodicities, may \nbe related to experimentally observable characteristics such as oscillatory ac(cid:173)\ntivity in relevant brain regions. Most importantly, on this view, the detailed \ncharacteristics of the filtering might be expected to adapt in the light of pat(cid:173)\nterns of activity. This suggests the straightforward experimental test of ma(cid:173)\nnipulating the input and/or output statistics and recording the consequences. \n\nVarious characteristics of the rules bear comment. Since we wanted to focus on \nstructural features of the rules, the graphs in the figures all lack precise time \nor frequency scales. In some cases we know the time constants of the kernels, \nand they are usually quite fast (on the order of tens of milliseconds). This can \nsuggest high frequency spectral peaks in assumed signal statistics. However, it \n\u00b7also hints at the potential inadequacy of our rate-based treatment that we have \ngiven, and suggests the importance of a spike-based treatment. 22\u2022 30 Recent \nevidence that successive pairs of pre- and post-synaptic spikes do not interact \nadditively in determining the magnitude and direction of plasticity18 make the \naveraging inherent in the rate-based approximation less appealing. Further, \nwe commented at the outset that pre- and post-synaptic filtering have similar \neffects, provided that all the filters on one post-synaptic cell are the same. If \nthey are different, then synapses might well be treated as individual filters, \nascertaining important signals for learning. In our framework, it is interesting \nto speculate about the role of (pre-)synaptic depression itself as a form of noise \nfilter (since noise should be filt\u20acred before it can affect the activity of the post(cid:173)\nsynaptic cell, rather than just its plasticity); leaving the kernel as a significance \nfilter, as in the whitening treatment. Finally, largely because of the separate \nroles of signal and noise, we have been unable to think of a simple experiment \nthat would test between Wiener and whitening filtering. However, it is a quite \ncritical issue in further exploring computational accounts of plasticity. \n\nAcknowledgements \n\nWe are very grateful to Odelia Schwartz for helpful discussions. Funding was \nfrom the Gatsby Charitable Foundation, the Wellcome Trust (MH) and an HFSP \nLong Term Fellowship (ML). \n\nReferences \n\u00b7 [1] Abbott, LF, & Blum, KI (1996) Functional significance of long-term potentiation for sequence learning \n\nand prediction. Cerebral Cortex 6:406-416. \n\n[2] Abbott, LF & Nelson, SB (2000) Synaptic plasticity: taming the beast. Nature Neuroscience 3:1178-1183. \n\n\f[3] Atick, JJ, li, Z, & Redlich, AN (1992) Understanding retinal color coding from first principles. Neural \n\nComputation 4:559-572. \n\n[4] Bell, CC, Han, VZ, Sugawara, Y & Grant K (1997) Synaptic plasticity in a cerebellum-like structure \n\ndepends on temporal order. Nature 387:278-81. \n\n[5] Bi, GQ & Poo, MM (1998) Synaptic modifications in cultured hippocampal neurons: dependence on \n\nspike timing, synaptic strength, and postsynaptic cell type. Journal of Neurosdence 18:10464-10472. \n[6] Bienenstock, EL, Cooper, IN, & Munro, PW (1982) Theory for the development of neuron selectivity: \n\nOrientation specificity and binocular interaction in visual cortex. Journal of Neuroscience 2:32-48. \n\n[7] Blum, KI, & Abbott, LF (1996) A model of spatial map formation in the hippocampus of the rat. Neural \n\nComputation 8:85-93. \n\n[8] Buiatti, M & Van Vreeswijk, C (2003) Variance normalisation: a key mechanism for temporal adaptation \n\nin natural vision? VISion Research, in press. \n\n[9] Cateau, H & Fukai, T (2003) A stochastic method to predict the consequence of arbitrary forms of \n\nspike-timing-dependent plasticity. Neural Computation 15:597-620. \n\n[1 0] Chechik, G (2003). Spike time dependent plasticity and information maximization. Neural Computation \n\nin press. \n\n[11] Debanne, D, Gahwiler, BH & Thompson, SM (1998) Long-term synaptic plasticity between pairs of \n\nindividual CA3 pyramidal cells in rat hippocampal slice cultures. Journal of Physiology 507:237-247. \n\n[12] Dong, DW, & A tick, JJ (1995) Temporal decorrelation: A theory of lagged and nonlagged responses in \n\nthe lateral geniculate nucleus. Network: Computation in Neural Systems 6:159-178. \n\n[13] Edelman, S & Weinshall, D (1991) A self-organizing multiple-view representation of 3D objects. Biolog(cid:173)\n\nical Cybernetics 64:209-219. \n\n[14] Egger, V, Feldmeyer, D & Sakmann, B (1999) Coincidence detection and changes of synaptic efficacy in \n\nspiny stellate neurons in rat barrel cortex. Nature Neurosdence 2:1098-1105. \n\n[15] Fairhall, AL, Lewen, GD, Bialek, W & de Ruyter Van Steveninck, RR (2001) Efficiency and ambiguity in \n\nan adaptive neural code. Nature 412:787-792. \n\n[16] Feldman, DE (2000) Timing-based LTP and LTD at vertical inputs to layer II/III pyramidal cells in rat \n\nbarrel cortex. Neuron 27:45-56. \n\n[17] Foldiflk, P (1991) Learning invariance from transformed sequences. Neural Computation 3:194-200. \n[18] Froemke, RC & Dan, Y (2002) Spike-timing-dependent synaptic modification induced by natural spike \n\ntrains. Nature 416:433-438. \n\n[19] Ganguly K, Kiss, L & Poo, M (2000) Enhancement of presynaptic neuronal excitability by correlated \n\npresynaptic and postsynaptic spiking. Nature Neuroscience 3:1018-1026. \n\n[20] Gerstner, W & Abbott, LF (1997) Learning navigational maps through potentiation and modulation of \n\nhippocampal place cells. Journal of Computational Neurosdence 4:79-94. \n\n[21] Gerstner, W, Kempter, R, van Hemmen, JL & Wagner, H (1996) A neuronal learning rule for sub(cid:173)\n\nmillisecond temporal coding. Nature 383:76-81. \n\n[22] Gerstner, W & Kistler, WM (2002) Mathematical formulations ofHebbianlearning. Biological Cybernetics \n\n87:404-15 .. \n\n[23] Hull, CL (1943) Prindples of Behavior New York, NY: Appleton-Century. \n[24] Levy, WB & Steward, D (1983) Temporal contiguity requirements for long-term associative potentia(cid:173)\n\ntion/depression in the hippocampus. Neurosdence 8:791-797 \n\n[2 5] Markram, H, Lubke, J, Frotscher, M, & Sakmann, B (1997) Regulation of synaptic efficacy by coincidence \n\nof postsynaptic APs and EPSPs. Science 275:213-215. \n\n[26] Minai, AA, & Levy, WB (1993) Sequence learning in a single trial. International Neural Network Society \n\nWorld Congress of Neural Networks II. Portland, OR: International Neural Network Society, 505-508. \n\n[2 7] Pavlov, PI (192 7) Conditioned Reflexes Oxford, England, OUP. \n[28] Porr, B & Worgotter, F (2003) Isotropic sequence order learning. Neural Computation 15:831-864. \n[29] Rao, RP & Sejnowski, TJ (2001) Spike-timing-dependentHebbian plasticity as temporal difference learn(cid:173)\n\ning. Neural Computation 13:2221-2237. \n\n\u00b7 \n\n[30] Sjostrom, PJ, Turrigiano, GG & Nelson, SB (2001) Rate, timing, and cooperativity jointly determine \n\ncortical synaptic plasticity. Neuron 32:1149-1164. \n\n[31] Sutton, RS (1988) Learning to predict by the methods of temporal difference. Machine Learning 3:9-44. \n[3 2] Sutton, RS & Barto, AG (1981) Toward a modern theory of adaptive networks: Expectation and predic(cid:173)\n\ntion. Psychological Review 88:135-170. \n\n[33] van Rossum, MC, Bi, GQ & Turrigiano, GG (2000) Stable Hebbian learning from spike timing-dependent \n\nplasticity. journal of Neurosdence 20:8812-21. \n\n[34] Wallis, G & Baddeley, R (1997) Optimal, unsupervised learning in invariant object recognition. Neural \n\nComputation 9:883-894. \n\n[35] Wallis, G & Rolls, ET (1997). Invariant face and object recognition in the visual system. it Progress in \n\n[36] Yu, AJ & Dayan, P (2003) Expected and unexpected uncertainty: ACh & NE in the neocortex. In NIPS \n\nNeurobiology 51:167-194. \n\n2002 Cambridge, MA: MIT Press. \n\n\f", "award": [], "sourceid": 2392, "authors": [{"given_name": "Peter", "family_name": "Dayan", "institution": null}, {"given_name": "Michael", "family_name": "H\u00e4usser", "institution": null}, {"given_name": "Michael", "family_name": "London", "institution": null}]}