{"title": "Spike-Timing-Dependent Learning for Oscillatory Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 152, "page_last": 158, "abstract": null, "full_text": "Spike-Timing-Dependent  Learning for \n\nOscillatory  Networks \n\nSilvia Scarp etta \n\nDept.  of Physics  \"E.R.  Caianiello\" \nSalerno  University 84081  (SA)  Italy \nand INFM, Sezione di Salerno Italy \n\nZhaoping Li \n\nGatsby Compo  Neurosci.  Unit \n\nUniversity College,  London,  WCIN 3AR \n\nUnited  Kingdom \n\nscarpetta@na. infn. it \n\nzhaoping@gatsby.ucl.ac.uk \n\nJohn Hertz \n\nNordita \n\n2100  Copenhagen 0, Denmark \n\nheriz@nordita.dk \n\nAbstract \n\nWe  apply to oscillatory networks a  class of learning rules in which \nsynaptic weights change proportional to pre- and post-synaptic ac(cid:173)\ntivity,  with  a  kernel  A(r) measuring the effect  for  a  postsynaptic \nspike a time r  after the presynaptic one.  The resulting synaptic ma(cid:173)\ntrices have an outer-product form in which the oscillating patterns \nare  represented  as  complex  vectors.  In  a  simple  model,  the even \npart of A(r) enhances the resonant response to learned stimulus by \nreducing the effective  damping, while the odd part determines the \nfrequency of oscillation.  We relate our model to the olfactory cortex \nand hippocampus and their presumed roles  in forming  associative \nmemories and input representations. \n\n1 \n\nIntroduction \n\nRecent  studies  of synapses  between  pyramidal  neocortical  and  hippocampal  neu(cid:173)\nrons  [1,  2,  3,  4]  have revealed that changes in  synaptic efficacy  can  depend  on the \nrelative timing of pre- and postsynaptic spikes.  Typically,  a  presynaptic spike fol(cid:173)\nlowed by a postsynaptic one leads to an increase in efficacy  (LTP), while the reverse \ntemporal order leads to a  decrease  (LTD).  The dependence of the change in synap(cid:173)\ntic  efficacy  on  the difference  r  between  the  two  spike  times  may  be characterized \nby  a  kernel  which  we  denote  A(r)  [4].  For  hippocampal  pyramidal  neurons,  the \nhalf-width of this kernel is  around 20  ms. \n\nMany important neural structures, notably hippocampus and olfactory cortex, ex(cid:173)\nhibit oscillatory activity in the 20-50 Hz  range.  Here the temporal variation of the \nneuronal  firing  can  clearly  affect  the  synaptic  dynamics,  and  vice  versa.  In  this \npaper we study a simple model for learning oscillatory patterns, based on the struc(cid:173)\nture of the kernel A( r)  and other known physiology of these areas.  We  will  assume \n\n\fthat these synaptic changes in long range lateral connections are driven by oscilla(cid:173)\ntory, patterned input to a network that initially has only local synaptic connections. \nThe  result  is  an  imprinting of the  oscillatory  patterns  in  the  synapses,  such  that \nsubsequent input of a  similar pattern will  evoke a  strong resonant  response.  It can \nbe  viewed  as  a  generalization to oscillatory networks  with  spike-timing-dependent \nlearning of the standard scenario whereby stationary patterns are stored in Hopfield \nnetworks using the conventional Hebb  rule. \n\n2  Model \n\nThe computational  neurons of the  model  represent  local  populations  of biological \nneurons that share common input.  They follow  the equations of motion  [5] \n\nUi  =  -aUi - (3?gv(Vi)  + L  J~gu(Uj) + Ii, \n\nj \n\nVi \n\n-avi +')'Pgu(Ui) + Lwggu(Uj). \n\n#i \n\n(1) \n\n(2) \n\nHere  Ui  and  Vi  are  membrane  potentials  for  excitatory  and  inhibitory  (formal) \nneuron  i,  a- 1  is  their  membrane  time  constant,  and  the  sigmoidal  functions \ngu(  )  and  gv(  )  model  the  dependence  of their  outputs  (interpreted  as  instanta(cid:173)\nneous  firing  rates)  on  their  membrane  potentials.  The  couplings  (3?  and  ')'?  are \ninhibitory-to-excitatory (resp. excitatory-to-inhibitory) connection strengths within \nlocal excitatory-inhibitory pairs, and for  simplicity we  take the external drive  Ii(~ \nto  act  only  on  the excitatory units.  We  include  nonlocal  excitatory  couplings  Jij \nbetween excitatory units and wg from  excitatory units to inhibitory ones.  In this \nminimal  model,  we  ignore  long-range  inhibitory  couplings,  appealing  to  the  fact \nthat  real  anatomical  inhibitory  connections  are  predominantly  short-ranged.  (In \nwhat  follows,  we  will  sometimes  use  bold  and  sans  serif notation  (e.g.,  u,  J)  for \nvectors and matrices, respectively.)  The structure of the couplings is  shown in Fig. \n1A. \n\nThe model is  nonlinear, but here we  will limit our treatment to an analysis of small \noscillations  around  a  stable fixed  point  {ii, v} determined  by  the  DC  part  of the \ninput.  Performing the linearization  and eliminating the  inhibitory units  [6,  5],  we \nobtain \n\nii + [2a  - J]ti + [a2 + (3(')' + W)  - aJ]u = (at + a)81. \n\n(3) \nHere  u  is  now  measured from  the fixed  point  ii,  81  is  the time-varying part of the \ninput,  and  the  elements  of J  and  W  are  related  to  those of JO  and  WO  by  Wij  = \ng~(Uj)wg and  Jij  = g~(Uj)J~.  For  simplicity,  we  have  assumed that the effective \nlocal  couplings  (3i  =  g~(Vi)(3?  and  ')'i  =  g~(uih? are  independent  of  i:  (3i  =  (3, \n')'i  = ')'.  With oscillatory inputs 81 = ee-iwt  + c.c.,  the oscillatory pattern elements \n~i  = I~ile-i\u00a2i  are  complex,  reflecting  possible  phase  differences  across  the  units. \nWe  likewise  separate the  response  u  = u+ + u- (after  the  initial transients)  into \npositive- and negative-frequency components u\u00b1  (with u- =  u+*  and u\u00b1 ex:  e'fiwt). \nSince  ti\u00b1  =  =t=iwu\u00b1,  Eqn.  (3)  can be written \n\n[2a \u00b1 ~(a2 + (3')'  - w2)]  u\u00b1 = M\u00b1u\u00b1 +  (1 \u00b1 ~) 81\u00b1, \n\na  form  that shows  how  the matrix \n\n(4) \n\n(5) \n\nM\u00b1(w) ==  J =t=  !..((3W - aJ). \n\nw \n\ndescribes the effective coupling between local oscillators.  2a is the intrinsic damping \nand J a 2  + (3')'  the frequency of the individual oscillators. \n\n\fII L \nCD \n\nu. \n\n1 j \nJ:,  G) \n\n+  ... --------~  + \n\nU l \n\n, \n\n, \n, \n, \n, \n, \n, \n, \n, \n, \n, \n, \n,  , \n, \n, \n>, \n, \n, \n, \n, \n, \n, \nO /w:, \n\n'0 \n\n/ \n\nA \n\nB.1 \n\nB.2 \n\nFigure 1:  A.  The model:  In addition to the local excitatory-inhibitory connections \n(vertical  solid  lines),  there  are  nonlocallong-range connections  (dashed  lines)  be(cid:173)\ntween excitatory units (Jij ) and from excitatory to inhibitory units (Wij ).  External \ninputs  are fed  to the excitatory units.  B:  Activation function  used  in  simulations \nfor  excitatory units  (B.1)  and inhibitory units  (B.2).  Crosses mark the equilibrium \npoint  (ii, v)  of the system. \n\n2.1  Learning phase \n\nWe  employ a  generalized Hebb rule of the form \n\nc5Cij (t)  ='T/  rT  dtjOO  dTYi(t+T)A(T)Xj(t) \n\n10 \n\n-00 \n\n(6) \n\nfor  synaptic  weight  Cij,  where  Xj  and Yi  are the  pre- and  postsynaptic  activities, \nmeasured  relative  to  stationary  levels  at  which  no  changes  in  synaptic  strength \noccur.  We  consider a  general kernel A(T),  although experimentally A(T) > 0  \u00ab  0) \nfor  T  > 0  \u00ab  0).  We  will  apply the rule to both J  and W  in our linearized network, \nwhere  the  firing  rates  9u(Ui)  and  9v(Vi)  vary  linearly  with  Ui  and  Vi,  so  we  will \nuse  Eqn.  (6)  with  Xj  = Uj  and Yi  = Ui  or Vi  (measured from  the  fixed  point  Vi), \nrespectively. \nWe  assume  oscillatory  input  c51  = eOe-iwot  + c.c.  during  learning.  In  the  brain \nstructures  we  are  modeling,  cholinergic  modulation makes  the  long-range  connec(cid:173)\ntions  ineffective  during  learning  [7].  Thus  we  set  J  = W  = 0  in  Eqn.  (3)  and \nfind \n\nu7- = \n\nt \n\n(  +.  )':0  -iwot \nWo \n\nIa '>i e \n\n2awo + i(a2  + (3\"1  - w5)  -\n\nt \n\n= Uo~~e-iwot \n\nand, from  (at + a)vi = \"lUi, \n\nUsing these in the learning rule  (6)  leads to \n\n(7) \n\n(8) \n\n(9) \n\nwhere  A(w)  =  J~oodT A(T)e- iwT \n27r'T/J IUol2 /wo, and 'T/J(W)  are the respective learning rates.  When the rates are tuned \nsuch that 'T/J  =  'T/w\"l(3/(a 2 +w5)  and when w =  Wo,  we  have Mit  =  JoA(wo)~?~J*, a \n\nis  the  Fourier  transform  of  A(T),  Jo \n\n\fgeneralization of the  outer-product learning rule  to the complex  patterns el-l  from \nthe Hopfield-Hebb form for  real-valued patterns.  For learning multiple patterns e, \nf.L  =  1,2, ... , the  learned  weights  are  simply  sums  of contributions from  individual \npatterns like  Eqns.  (9)  with ~? replaced by ~r. \n\n2.2  Recall phase \n\nWe  return  to  the  single-pattern  problem  and  study  the  simple  case  when  'fJJ \n'fJw'\"Y(3/(a 2  + w~).  Consider first  an  input  pattern 81  = ee-iwt  + c.c.  that  matches \nthe stored pattern exactly (e = eO),  but possibly oscillating at a different frequency. \nWe  then find,  using Eqns.  (9)  in  Eqn.  (3),  the  (positive-frequency)  response \n\nu+ -\n\n- 2aw - ~(w + wo)A'(wo) + i[a 2  + (3'\"Y  - ~(w + WO)AII(WO)  - w2 ]\u00b7 \n\n(w + ia)eoe-iwt \n\n(10) \n\nwhere  A'(wo)  ==  ReA(wo)  and A\"(WO)  ==  ImA(wo).  For strong response at w =  Wo, \nwe  require \n\n(11) \n\nThis means  (1)  the  resonance frequency Wo  is  determined  by  A\",  (2)  the effective \ndamping  2a - JoA'  should  be  small,  and  (3)  deviation  of w from  Wo  reduces  the \nresponses. \nIt is instructive to consider the case where the width of the time window for synaptic \nchange  is  small  compared with the oscillation  period.  Then we  can expand  A(wo) \nin Wo: \n\n(12) \nIn particular,  A(T)  =  8(T)  gives  ao  =  1  and  al  =  0  and the  conventional  Hebbian \nlearning [5].  Experimentally, al > 0 , implying a resonant frequency greater than the \nintrinsic local frequency,  J a 2  + (3'\"Y  obtained in the absence of long-range coupling. \nIf the  drive  e  does  not  match  the  stored  pattern  (in  phase  and  amplitude),  the \nresponse will  consist of two terms.  The first  has the form of Eqn.  (10)  but reduced \nin  amplitude  by  an  overlap  factor  eO*  . e. \n(For  convenience  we  use  normalized \npattern vectors.)  The  second  term  is  proportional to the  part  of e  orthogonal to \nthe  stored  pattern.  The  J  and  W  matrices  do  not  act  in  this  subspace,  so  the \nfrequency  dependence  of this  term  is  just  that  of uncoupled  oscillators,  i.e.,  Eqn. \n(10)  with Jo set equal to zero.  This response is always highly damped and therefore \nsmall. \nIt is  straightforward to  extend  this  analysis  to  multiple  imprinted  patterns.  The \nresponse consists of a sum of terms, one for each stored pattern.  The term for each \nstored  pattern is  just  like  that just  described  in  the  single-stored-pattern  case:  it \nhas  one  part  for  the  input  component  parallel  to  the  stored  pattern  and  another \npart for  the component orthogonal to the stored pattern. \n\nWe  note  that,  in  this  linear  analysis,  an  input  which  overlaps several stored pat(cid:173)\nterns will (if the imprinting and input frequencies match) evoke  a resonant response \nwhich  is  a  linear  combination  of the  stored  patterns.  Thus,  a  network  tuned  to \noperate in a  nearly linear regime is  able to interpolate in forming its representation \nof the input.  For categorical associative memory, on the other hand, a network has \nto  work in the extreme nonlinear  limit,  responding with only the strongest  stored \npattern in an input mixture.  As our network operates near the threshold for sponta(cid:173)\nneous oscillations, we expect that it should exhibit properties intermediate between \n\n\fA \n\nB \n00  0  00 \n\n200 \n\n\" ~ \n\n1'i \n~ 100 \n\n200 \n\n\" 'C \n. .E \n1'i \n~ 100 \n\n60 \n\no  o-eo* \n\n02 \n\n04 \n\n06 \n\nOverlap \n\n\u2022 \n\n0 8 \n\nC \n\n90', - - - - - - - - - - - - - : := - - - ,$  \n\n,* \n\n* \n00  \"  00 \n\n.. / \n\n* \n\n45 \n\nInput angle (degrees) \n\n90 \n\nFigure 2:  Circles show non-linear simulation results, stars show the linear simulation \nresults, while the dotted line show the analytical prediction for the linearized model. \nA.  Importance  of  frequency  match:  amplitude  of response  of  output  units  as  a \nfunction  of  the  frequency  of the  current  input.  The  frequency  of the  imprinted \npattern  is  41  Hz.  B.Importance of amplitude  and  phase  mismatch:  amplitude  of \nresponse as a function of overlap between current input and imprinted pattern (i.e. \nI~o* . ~I),  for  different  presented  input  patterns~.  C:  Input  - output  relationship \nwhen two orthogonal patterns e and e, have been imprinted at the same frequency \nw = 41Hz.  The angle of input pattern with resrect to ~1 is  shown as a function of \nthe angle of the output pattern with respect to ~ ,for many different input patterns. \n\nthese  limits.  We  find  that  this  is  indeed  the  case  in  the  simulations  reported  in \nthe  next  section.  From  our  analysis it turns  out that  the  network  behaves  like  a \nHopfield-memory  (separate  basins,  without  interpolation  capability)  for  patterns \nwith different imprinting frequencies,  but at the same time it is  able to interpolate \namong patterns which share a  common frequency. \n\n3  Simulations \n\nChecking  the  validity  of our  linear  approximation  in  the  analysis,  we  performed \nnumerical simulations of both the non-linear equations (1,2)  and the linearized ones \n(3).  We  simulated the recall  phase of a  network consisting of 10 excitatory and  10 \ninhibitory cells.  The connections Jij and Wij were calculated from  Eqns.  (9), where \nwe  used  the approximations  (12)  for  the  kernel  shape  A(T).  Parameters were  set \nin  such  a  way  that  the  selective  resonance  was  in  the  40-Hz  range.  In  non-linear \nsimulations we used different piecewise linear activation functions for 9u() and 9v(), \nas shown in Fig.1B. We  chose the parameters of the functions 9u () and 9v () so that \nthe network equilibrium points Ui, ih  were close to, but below, the high-gain region, \ni.e.  at the points marked with crosses in Fig.  lB. \n\nThe  results  confirm  that  when  the  input  pattern  matches  the  imprinted  one  in \nfrequency,  amplitude  and phase,  the  network responds  with  strong resonant oscil(cid:173)\nlations.  However,  it  does  not  resonate  if the frequencies  do  not  match,  as  shown \nin  the frequency  tuning  curve in  Fig.  2A.  The behavior when  the  two frequencies \nare close  to  each  other differs  in  the  linear  and  nonlinear  cases.  However,  in  both \ncases  a  sharp  selectivity in  frequency  is  observed.  The dependence  on  the overlap \nbetween the input and the stored pattern is  shown in Fig.  2B.  The non-linear case, \nindicated by circles,  should be compared with the linear case, where the amplitude \nis  always  linear in  the overlap.  In  the  nonlinear  case,  the linearity  holds  roughly \nonly  for  overlaps  lower  than  about  0.4;  for  larger overlaps the  amplification  is  as \nhigh as for the perfect match case.  This means that input patterns with an overlap \nwith  the  imprinted  one  greater  than  0.4  lie  within  the  basis  of attraction  of the \n\n\fW  =WI \n\ne = el \n\n50 ,----------, \n\n200 \n\n200 \n\n400 \n\n400 \n\n,NS:C] \n_soC] \n,\" 1--1 \n-soC]  -50 \n,.':~ ,. ~O ,.':8  ,. ~8 \n\n,\" 5:0  ,\" S:c=J \n\n_soc=J \n\n-50 \n\n0 \n\n200 \n\n400 \n\n200 \n\n400 \n\n200 \n\n400 \n\n0 \n\n200 \n\no \n\no \n\n0 \n\n- 50 \n\no \n\n200 \n\n400 \n\n-SO \n\n0 \n\n200 \n\n400 \n\n- 50 \n0 \n\n200 \n\n400 \n\n- 50 \n0 \n\n200 \n\n400 \n\nFigure 3:  Frequency selectivity:  Response evoked on 3 of the 10 neurons.  Oscillatory \npatterns ele-iwlt + c.c.  and  e2e-iw2tc.c.  have  been  imprinted,  with  e l  .1 e and \nWI  = 41  Hz, W2  = 63 Hz.  During the learning phases the parameter al of kernel was \ntuned appropriately, i.e.  al = 0.1  when imprinting e l  and al = 1.1 when imprinting \ne\u00b7 \n\nimprinted pattern. \n\nThe response elicited  when two orthogonal patterns have been imprinted with the \nsame frequency  is  shown in Fig.  2C.  Let  ele-iwot  + c.c. and ee-iwot + c.c.  denote \nthe imprinted patterns,  and ee-iwot + c.c.  be the input to the trained network.  In \nboth linear and non-linear simulations the network responds vigorously(with high(cid:173)\namplitude oscillations) to the drive if e  is in the subspace spanned by the imprinted \npatterns,  and fails  to  respond  appreciably if e  is  orthogonal to that  plane.  When \nthe  input  pattern  e  is  in  the  plane  spanned  by  the  stored  patterns,  the  resonant \nresponse  u  also  lies  in  this  plane.  However,  while  in  the  linear  case  the  output  is \nproportional to  the  input,  in  agreement  with  the  analytical  analysis,  in  the  non(cid:173)\nlinear  case  there  are  preferred  directions,  in  the  stored  pattern  plane.  The figure \n\nshows  that,  in  case  simulated  here,  there  are three  stable  attractors:  e, e,  and \nthe symmetric linear combination  (el  + e 2)/V2). \nFinally we  performed linear simulations storing two orthogonal patterns ele-iwlt + \nc.c.  and  e2e-iw2t + c.c.  with  two  different  imprinting  frequencies.  Fig.  3  shows  a \ngood performance of the network in separating the basins of attraction in this case. \nThe response to  a  linear  combination of the two  patterns,  (ae + be)e-iw2t  + c.c. \nis  proportional to  the  part  of the  input  whose  imprinting  frequency  matches  the \ncurrent  driving frequency.  Linear  combinations of the two  imprinted patterns are \nnot  attractors if the two  patterns do  not  share the same imprinting frequency. \n\n4  Summary and Discussion \n\nWe have presented a model of learning for memory or input representations in neural \nnetworks with input-driven oscillatory activity.  The model structure is  an abstrac-\n\n\ftion of the hippocampus or the olfactory cortex.  We  propose a  simple  generalized \nHebbian  rule,  using  temporal-activity-dependent  LTP  and  LTD,  to  encode  both \nmagnitudes and phases of oscillatory patterns into the synapses in the network.  Af(cid:173)\nter learning, the model responds resonantly to inputs which have been learned  (or, \nfor networks which operate essentially linearly, to linear combinations of learned in(cid:173)\nputs),  but negligibly to other input patterns.  Encoding both amplitude and phase \nenhances  computational capacity,  for  which  the  price  is  having to  learn  both  the \nexcitatory-to-excitatory  and  the  excitatory-to-inhibitory  connections.  Our  model \nputs contraints  on the form  of the  learning  kernal  A(r) that  should  be  experime(cid:173)\nnally observed, e.g., for small oscillation frequencies, it requires that the overall LTP \ndominates the overall  LTD,  but  this  requirement  should  be modified  if the stored \noscillations are of high frequencies.  Plasticity in the excitatory-to-inhibitory connec(cid:173)\ntions  (for  which experimental evidence  and investigation is  still scarce)  is  required \nby our model for  storing phase locked but unsynchronous oscillation patterns. \n\nAs  for  the  Hopfield  model,  we  distinguish  two  functional  phases:  (1)  the learning \nphase,  in which the system is  clamped  dynamically to the external inputs and  (2) \nthe recall phase,  in  which the system dynamics is  determined by  both the external \ninputs and the internal interactions. \n\nA special property of our model  in  the linear regime  is  the following  interpolation \ncapability:  under a given oscillation frequency,  once the system has learned a set of \nrepresentation states, all other states in the subspace spanned by the learned states \ncan  also  evoke  vigorous  responses.  Hippocampal  place  cells  could  employ  such  a \nrepresentation.  Each  cell  has  a  localised  \"place  field\",  and  the  superposition  of \nactivity of several  cells  wth nearby place fields  can represent  continuously-varying \nposition.  The  locality  of the  place  fields  also  means  that  this  representation  is \nconservative  (and  thus  robust),  in  the  sense  that  interpolation  does  not  extend \nbeyond  the  spatial  range  of the  experienced  locations  or  to  locations  in  between \ntwo learned but distant and disjoint  spatial regions. \n\nOf course, this interpolation property is not  always  desirable.  For instance, in cat(cid:173)\negorical memory, one does  not want inputs which  are linear combinations of stored \npatterns  to  elicit  responses  which  are  also  similar  linear  combinations.  Suitable \nnonlinearity can  (as we  saw in the last section), enable the system to perform cate(cid:173)\ngorization:  one way involved storing different patterns (or, by implication, different \nclasses  of  patterns)  at  different  frequencies.  For  instance,  in  a  multimodal  area, \n\"place fields\"  might  be  stored  at  one  oscillation  frequency,  and  (say)  odor  mem(cid:173)\nories  at  another.  It seems  likely  to  us  that  the  brain  may employ  different  kinds \nand  degrees  of nonlinearity in  different  areas or  at  different  times  to  enhance the \nversatility of its computations. \n\nReferences \n\n[1]  H  Markram,  J  Lubke,  M  Frotscher,  and B  Sakmann,  Science  275  213  (1997). \n\n[2]  J  C  Magee  and D  Johnston,  Science 275  209  (1997). \n\n[3]  D  Debanne,  B H  Gahwiler,  and S M Thompson,  J  Physiol507 237  (1998) . \n[4]  G  Q Bi  and M  M  Poo,  J  Neurosci  18 10464  (1998). \n[5]  Z Li  and J  Hertz,  Network:  Computation  in Neural  Systems  11 83-102  (2000). \n\n[6]  Z Li  and J  J  Hopfield,  Biol Cybern 61  379-92  (1989) . \n\n[7]  M E  Hasselmo,  Neural  Comp 5  32-44  (1993). \n\n\f", "award": [], "sourceid": 1828, "authors": [{"given_name": "Silvia", "family_name": "Scarpetta", "institution": null}, {"given_name": "Zhaoping", "family_name": "Li", "institution": null}, {"given_name": "John", "family_name": "Hertz", "institution": null}]}