{"title": "The Cerebellum Chip: an Analog VLSI Implementation of a Cerebellar Model of Classical Conditioning", "book": "Advances in Neural Information Processing Systems", "page_first": 577, "page_last": 584, "abstract": null, "full_text": "                                                                    \n                                   The cerebellum chip: \n         an analog VLSI implementation of a \ncerebellar model of classical conditioning \n                                                                    \n                                                                    \n \n                             Constanze Hofsttter, Manuel Gil, Kynan Eng,  \nGiacomo Indiveri, Matti Mintz, Jrg Kramer* and Paul F. M. J. Verschure \n                                                                    \n                                              Institute of Neuroinformatics \n                                                   University/ETH Zurich \n                                              CH-8057 Zurich, Switzerland \n                                                   pfmjv@ini.phys.ethz.ch \n\n                                                               Abstract \n\n              We  present  a  biophysically  constrained  cerebellar  model  of \n              classical  conditioning,  implemented  using  a  neuromorphic  analog \n              VLSI (aVLSI) chip.  Like its biological counterpart, our cerebellar \n              model  is  able  to  control  adaptive  behavior  by  predicting  the \n              precise timing of events.  Here we describe the functionality of the \n              chip  and  present  its  learning  performance,  as  evaluated  in \n              simulated  conditioning  experiments  at  the  circuit  level  and  in \n              behavioral  experiments  using  a  mobile  robot.    We  show  that  this \n              aVLSI model supports the acquisition and extinction of adaptively \n              timed  conditioned  responses  under  real-world  conditions  with \n              ultra-low power consumption. \n\n1  Introduction \n\nThe association of two correlated stimuli, an initially neutral conditioned stimulus \n(CS)  which  predicts  a  meaningful  unconditioned  stimulus  (US),  leading  to  the \nacquisition of an adaptive conditioned response (CR), is one of  the  most essential \nforms  of  learning.    Pavlov  introduced  the  classical  conditioning  paradigm  in  the \nearly  20th  century  to  study  associative  learning  (Pavlov  1927).    In  classical \nconditioning  training  an  animal  is  repeatedly  exposed  to  a  CS  followed  by  a  US \nafter  a  certain  inter-stimulus  interval  (ISI).    The  animal  learns  to  elicit  a  CR \nmatched to the ISI, reflecting its knowledge about an association between the CS, \nUS,  and  their  temporal  relationship.    Our  earlier  software  implementation  of  a \n                                                           \n     *Jrg  Kramer  designed  the  cerebellum  chip  that  was  first  tested  at  the  2002  Telluride \n     Neuromorphic Engineering Workshop.  Tragically, he died soon afterwards while hiking \n     on Telescope Peak on 24 July, 2002. \n\n\f\n                                                                                                   \n\n\n\n\n\nbiophysically  constrained  model  of  the  cerebellar  circuit  underlying  classical \nconditioning  (Verschure  and  Mintz  2001;  Hofsttter  et  al.  2002)  provided  an \nexplanation of this phenomenon by assuming a negative feedback loop between the \ncerebellar cortex, deep nucleus and inferior olive.  It could acquire and extinguish \ncorrectly  timed  CRs  over  a  range  of  ISIs  in  simulated  classical  conditioning \nexperiments,  as  well  as  in  associative  obstacle  avoidance  tasks  using  a  mobile \nrobot.    In  this  paper  we  present  the  analog  VLSI  (aVLSI)  implementation  of  this \ncerebellum  model    the  cerebellum  chip    and  the  results  of  chip-level  and \nbehavioral robot experiments. \n\n2  The model circuit and aVLSI implementation \n\n\n\n\n\n                                                                                              \nFigure  1:  Anatomy  of  the  cerebellar  model  circuit  (left)  and  the  block  diagram  of \nthe corresponding chip (right). \n\nThe model (Figure 1) is based on the identified cerebellar pathways of CS, US and \nCR  (Kim  and  Thompson  1997)  and  includes  four  key  hypotheses  which  were \nimplemented in the earlier software model (Hofsttter et al. 2002):  \n1.  CS  related  parallel  fiber  (pf)  and  US  related  climbing  fiber  (cf)  signals \n     converge at Purkinje cells (PU) in the cerebellum (Steinmetz et al. 1989). The \n     direction of the synaptic changes at the pf-PU-synapse depends on the temporal \n     coincidence of pf and cf activity. Long-term depression (LTD) is induced by pf \n     activity followed by cf activity within a certain time interval, while pf activity \n     alone induces long-term potentiation (LTP) (Hansel et al. 2001). \n2.  A  prolonged  second  messenger  response  to  pf  stimulation  in  the  dendrites  of \n     PU  constitutes  an  eligibility  trace  from  the  CS  pathway  (Sutton  and  Barto \n     1990) that bridges the ISI (Fiala et al. 1996). \n3.  A microcircuit (Ito 1984) comprising PU, deep nucleus (DN) and inferior olive \n     (IO) forms a  negative  feedback loop. Shunting  inhibition of IO by  DN blocks \n     the  reinforcement  pathway  (Thompson  et  al.  1998),  thus  controlling  the \n     induction of LTD and LTP at the pf-PU-synapse. \n4.  DN  activity  triggers  behavioral  CRs  (McCormick  and  Thompson  1984).  The \n     inhibitory  PU  controls  DN  activity  by  a  mechanism  called  rebound  excitation \n     (Hesslow  1994):  When  DN  cells  are  disinhibited  from  PU  input,  their \n\n\n\n\n\n \n\n\f\n                                                                                                  \n\n\n\n\n\n     membrane  potential  slowly  repolarises  and  spikes  are  emitted  if  a  certain \n     threshold  is  reached.    Thereby,  the  correct  timing  of  CRs  results  from  the \n     adaptation of a pause in PU spiking following the CS. \nIn  summary,  in  the  model  the  expression  of  a  CR  is  triggered  by  DN  rebound \nexcitation  upon  release  from  PU  inhibition.  The  precise  timing  of  a  CR  is \ndependent on the duration of an acquired pause in PU spiking following a CS. The \nPU response is regulated by LTD and LTP at the pf-PU-synapse under the control \nof a negative feedback loop comprising DN, PU and IO. \nWe implemented an analog VLSI version of the cerebellar model using a standard \n1.6m  CMOS  technology,  and  occupying  an  area  of  approximately  0.25  mm2.    A \nblock diagram of the hardware model is shown in Figure 1.  The CS block receives \nthe conditioned stimulus and generates two signals: an analog long-lasting, slowly \ndecaying trace (cs_out) and an equally long binary pulse (cs_wind). Similarly, the \nUS  block  receives  an  unconditioned  stimulus  and  generates  a  fast  pulse  (us_out).  \nThe two pulses cs_wind and us_out are sent to the LT-ISI block that is responsible \nfor  perfoming  LTP  and  LTD,  upregulating  or  downregulating  the  synaptic  weight \nsignal w.  This signal determines the gain by which the cs_out trace is multiplied in \nthe MU block.  The output of the multiplier MU is sent on to the PU block, together \nwith  the  us_out  signal.    It  is  a  linear  integrate-and-fire  neuron  (the  axon-hillock \ncircuit)  connected  to  a  constant  current  source  that  produces  regular  spontaneous \nactivity.    The  current  source  is  gated  by  the  digital  cf_wind  signal,  such  that  the \nspontaneous activity is shut off for the duration of the cs_out trace. \nThe chip allowed one of three learning rules to be connected.  Experiments showed \nthat  an  ISI-dependent  learning  rule  with  short  ISIs  resulting  in  the  strongest  LTD \nwas  the  most  useful  (Kramer  and  Hofsttter  2002).    Two  elements  were  added  to \nadapt  the  model  circuit  for  real-world  robot  experiments.    Firstly,  to  prevent  the \nexpression of a CR after a US had already been triggered, an inhibitory connection \nfrom IO to CRpathway was added.  Secondly, the transduction delay (TD) from the \naVLSI circuit to any effectors (e.g. motor controls of a robot) had to be taken into \naccount, which was done by adding a delay from DN to IO of 500ms. \nThe  chip's  power  consumption  is  conservatively  estimated  at  around  100  W \n(excluding  off-chip  interfacing),  based  on  measurements  from  similar  integrate-\nand-fire neuron circuits (Indiveri 2003).  This figure is an order of magnitude lower \nthan  what  could  be  achieved  using  conventional  microcontrollers  (typically  1-10 \nmW), and could be improved further by optimising the circuit design. \n\n3  Simulated conditioning experiments \n\nThe aim of the \"in vitro\" simulated conditioning experiments was to understand the \nlearning  performance  of  the  chip.    To  obtain  a  meaningful  evaluation  of  the \nperformance  of  the  learning  system  for  both  the  simulated  conditioning \nexperiments and the robot experiments, the measure of effective CRs was used.  In \nacquisition  experiments  CS-US  pairs  are  presented  with  a  fixed  ISI.    Whenever  a \nCR occurs that precedes the US, the US signal is not propagated to PU due to the \ninhibitory  connection  from  DN  to  IO.    Thus  in  the  context  of  acquisition \nexperiments a CR is defined as effective if it prevents the occurrence of a US spike \n\n\n\n\n\n \n\n\f\n                                                                                                  \n\n\n\n\n\nat  PU.    In  contrast,  in  robot  experiments  an  effective  CR  is  defined  at  the \nbehavioral level, including only CRs that prevent the US from occurring.  \n\n\n\n\n\n                                                                                   \nFigure 2: Learning related response changes in the cerebellar aVLSI chip. The most \nrelevant neural responses to a CS-US pair (ISI of 3s, ITI of 12s) are presented for a \ntrial before (naive) significant learning occurred and when a correctly timed CR is \nexpressed  (trained).    US-related  pf  and  CS/CR-related  cf  signals  are  indicated  by \nvertical  lines  passing  through  the  subplots.    A  CS-related  pf-signal  evokes  a \nprolonged response in the pf-PU-synapse, the  CS-trace  (Trace subplot).  While an \nactive  CS-trace  is  present,  an  inhibitory  element  (I)  is  active  which  inactivates  an \nelement  representing  the  spontaneous  activity  of  PU  (Hofsttter  et  al.  2002).    (A) \nThe US-related cf input occurs while there is an active CS-trace (Trace subplot), in \nthis  case  following  the  CS  with  an  ISI  of  3s.    LTD  predominates  over  LTP  under \nthese  conditions  (Weight  subplot).    Because  the  PU  membrane  potential  (PU) \nremains  above  spiking  threshold,  PU  is  active  and  supplies  constant  inhibition  to \nDN (DN) while in the CS-mode.  Thus, DN cannot repolarize and remains inactive \nso that no CR is triggered.  (B) Later in the experiment, the synaptic weight of the \npf-PU-synapse  (Weight)  has  been  reduced  due  to  previous  LTD.    As  a  result, \nfollowing  a  CS-related  pf  input,  the  PU  potential  (PU  subplot)  falls  below  the \nspiking  threshold,  which  leads  to  a  pause  in  PU  spiking.    The  DN  membrane \npotential  repolarises,  so  that  rebound  spikes  are  emitted  (DN  subplot).    This \nrebound  excitation  triggers  a  CR.    DN  inhibition  of  IO  prevents  US  related  cf-\nactivity.  Thus, although a US signal is still presented to the circuit, the reinforcing \nUS  pathway  is  blocked.    These  conditions  induce  only  LTP,  raising  the  synaptic \nweight of the pf-PU-synapse (Weight subplot). \n\nThe  results  we  obtained  were  broadly  consistent  with  those  reported  in  the \nbiological literature (Ito 1984; Kim and Thompson 1997).  The correct operation of \nthe  circuit  can  be  seen  in  the  cell  traces  illustrating  the  properties  of  the  aVLSI \ncircuit  components  before  significant  learning  (Figure  2  A),  and  after  a  CR  is \nexpressed (Figure 2B).   Long-term acquisition experiments (25 blocks of 10 trials \n\n\n\n\n\n \n\n\f\n                                                                                                      \n\n\n\n\n\neach over 50 minutes) showed that chip functions remained stable over a long time \nperiod.  In each trial the CS was followed by a US with a fixed ISI of 3s; the inter \ntrial  interval  (ITI)  was  12s.    The  number  of  effective  CRs  shows  an  initial  fast \nlearning phase followed by a stable phase with higher percentages of effective CRs \n(Figure  3B).    In  the  stable  phase  the  percentage  of  effective  CRs  per  block \nfluctuates around 80-90%.  There are fluctuations of up to 500ms in the CR latency \ncaused by the interaction of LTD and LTP in the stable phase, but the average CR \nlatency remains fairly constant. \nFigure  4  shows  the  average  of  five  acquisition  experiments  (5  blocks  of  10  trials \nper experiment) for ISIs of 2.5s, 3s and 3.5s.  The curves are similar in shape to the \nones in the long-term experiment.  The CR latency quickly adjusts to match the ISI \nand remains stable thereafter (Figure 4A). The effect of the ISI-dependent learning \nrule can be seen in two ways: firstly, the shorter the ISI, the faster the stable phase \nis  reached,  denoting  faster  learning.    Secondly,  the  shorter  the  ISI,  the  better  the \nperformance in terms of percentage of effective CRs (Figure 4B).  The parameters \nof the chip were tuned to optimally encode short ISIs in the range of 1.75s to 4.5s.  \nSeparate  experiments  showed  that  the  chip  could  also  adapt  rapidly  to  changes  in \nthe ISI within this range after initial learning. \n\n\n\n\n\n                                                                      (Error bar = 1 std. dev.) \n\nFigure 3: Long-term changes in CR latency (A) and % effective CRs (B) per block \nof 10 CSs during acquisition.  Experiment length = 50min., ISI = 3s, ITI = 12s. \n\n\n\n\n\n                                                                       (Error bar = 1 std. dev.) \n\nFigure 4:  Average of  five acquisition experiments per block of 10 CSs for ISIs of \n2.5s ( ), 3s (*) and 3.5s ( ).  (A) Avg. CR latency.  (B) Avg. % effective CRs. \n\n\n\n\n\n \n\n\f\n                                                                                                  \n\n\n\n\n\n4  Robot associative learning experiments \n\nThe  \"in  vivo\"  learning  capability  of  the  chip  was  evaluated  by  interfacing  it  to  a \nrobot  and  observing  its  behavior  in  an  unsupervised  obstacle  avoidance  task.  \nExperiments  were  performed  using  a  Khepera  microrobot  (K-team,  Lausanne, \nSwitzerland,  Figure  5A)  in  a  circular  arena  with  striped  walls  (Figure  5C).  The \nrobot was equipped with 6 proximal infra-red (IR) sensors (Figure 5B).  Activation \nof  these  sensors  (US)  due  to  a  collision  triggered  a  turn  of  ~110  in  the  opposite \ndirection (UR).   A line camera (64 pixels  x 256 gray-levels) constituted the distal \nsensor,  with  detection  of  a  certain  spatial  frequency  (~0.14  periods/degree) \nsignalling the CS.  Visual CSs and collision USs were conveyed to CSpathway and \nUSpathway on the chip.  The activation of CRpathway triggered a motor CR: a 1s \nlong regression followed by a turn of ~180.  Communication between the chip and \nthe  robot  was  performed  using  Matlab  on  a  PC.    The  control  program  could  be \ndownloaded to the robot's processor, allowing the robot to act fully autonomously.  \nIn  each  experiment,  the  robot  was  placed  in  the  circular  arena  exploring  its \nenvironment  with  a  constant  speed  of  ~4  cm/s.    A  spatial  frequency  CS  was \ndetected  at  some  distance  when  the  robot  approached  the  wall,  followed  by  a \ncollision  with  the  wall,  stimulating  the  IR  sensors  and  thus  triggering  a  US.  \nConsequently  the CS  was correlated  with the US, predicting it.  The ISIs of  these \nstimuli  were variable, due to noise in sensor sampling, and variations in the angle \nat which the robot approached the wall.  \n\n\n\n\n\n                                                                                 \nFigure 5: (A) Khepera microrobot with aVLSI chip mounted on top.  (B) Only the \nforward sensors  were  used  during the experiments.  (C) The environment: a 60cm \ndiameter  circular  arena  surrounded  by  a  15cm  high  wall.    A  pattern  of  vertical, \nequally sized black and white bars was placed on the wall. \n\nAssociative  learning  mediated  by  the  cerebellum  chip  significantly  altered  the \nrobot's  behavior  in  the  obstacle  avoidance  task  (Figure  6)  over  the  course  of  each \nexperiment.    In  the  initial  learning  phase,  the  behavior  was  UR  driven:  the  robot \ndrove forwards until it collided with the wall, only then performing a turn (Figure \n6A1).  In the trained phase, the robot usually turned just before it collided with the \nwall  (Figure  6A2),  reducing  the  number  of  collisions.    The  positions  of  the  robot \nwhen a CS, US or CR event occurred in these two phases are shown in Figure 6B1 \n\n\n\n\n\n \n\n\f\n                                                                                              \n\n\n\n\n\nand B2.  The CRs were not expressed immediately after the CSs, but rather with a \nCR latency adjusted to just prevent collisions (USs).  Not all USs were avoided in \nthe  trained  phase  due  to  some  excessively  short  ISIs  (Figure  7)  and  normal \nextinction  processes  over  many  unreinforced  trials.    After  the  learning  phase  the \npercentage of effective CRs fluctuated between 70% and 100% (Figure 7). \n\n\n\n\n\n                                                                    \nFigure  6:  Learning  performance  of  the  robot.  (Top  row)  Trajectories  of  the  robot. \nThe  white  circle  with  the  black  dot  in  the  center  indicates  the  beginning  of \ntrajectories.    (Bottom  row)  The  same  periods  of  the  experiment  examined  at  the \ncircuit level:   = CS, * = US,   = CR.  (A1, B1) Beginning of the experiment (CS \n3-15). (A2, B2) Later in the experiment (CS 32-44). \n\n\n\n\n\n                                                                              \nFigure 7: Trends in learning behavior (average of 5 experiments, 25 min. each).  90 \nCSs were presented in each experiment.  Error bars indicate one standard deviation.  \n(A) Average percentage of effective CRs over 9 blocks of 10 CSs. (B) Number of \nCS occurrences ( ), US occurrences (*) and CR occurrences ( ). \n\n5  Discussion \n\nWe have presented one of the first examples of a biologically constrained model of \nlearning  implemented  in  hardware.    Our  aVLSI  cerebellum  chip  supports  the \nacquisition  and  extinction  of  adaptively  timed  responses  under  noisy,  real  world \n\n\n\n\n\n \n\n\f\n                                                                                                        \n\n\n\n\n\nconditions.    These  results  provide  further  evidence  for  the  role  of  the  cerebellar \ncircuit embedded in a synaptic feedback loop in the learning of adaptive behavior, \nand  pave  the  way  for  the  creation  of  artefacts  with  embedded  ultra  low-power \nlearning capabilities. \n\n6  References \n\nFiala,  J.  C.,  Grossberg,  S.  and  Bullock,  D.  (1996).  Metabotropic  glutamate  receptor \nactivation  in  cerebellar  Purkinje  cells  as  substrate  for  adaptive  timing  of  the  classical \nconditioned eye-blink response. Journal of Neuroscience 16: 3760-3774. \nHansel, C., Linden, D. J. and D'Angelo, E. (2001). Beyond parallel fiber LTD, the diversity \nof synaptic and nonsynaptic plasticity in the cerebellum. Nature Neuroscience 4: 467-475. \nHesslow, G. (1994). Inhibition of classical conditioned eyeblink response by stimulation of \nthe cerebellar cortex in decerebrate cat. Journal of Physiology 476: 245-256. \nHofsttter,  C.,  Mintz,  M.  and  Verschure,  P.  F.  M.  J.  (2002).  The  cerebellum  in  action:  a \nsimulation and robotics study. European Journal of Neuroscience 16: 1361-1376. \nIndiveri,  G.  (2003).  A  low-power  adaptive  integrate-and-fire  neuron  circuit.  IEEE \nInternational Symposium on Circuits and Systems, Bangkok, Thailand, 4: 820-823. \nIto,  M.  (1984).  The  modifiable  neuronal  network  of  the  cerebellum.  Japanese  Journal  of \nPhysiology 5: 781-792. \nKim,  J.  J.  and  Thompson,  R.  F.  (1997).  Cerebellar  circuits  and  synaptic  mechanisms \ninvolved in classical eyeblink conditioning. Trends in the Neurosciences 20(4): 177-181. \nKim,  J.  J.  and  Thompson,  R.  F.  (1997).  Cerebellar  circuits  and  synaptic  mechanisms \ninvolved in classical eyeblink conditioning. Trend. Neurosci. 20: 177-181. \nKramer,  J.  and  Hofsttter,  C.  (2002).  An  aVLSI  model  of  cerebellar  mediated  associative \nlearning. Telluride Workshop, CO, USA. \n\nMcCormick, D. A. and Thompson, R. F. (1984). Neuronal response of the rabbit cerebellum \nduring acquisition and performance of a classical conditioned nictitating membrane-eyelid \nresponse. J. Neurosci. 4: 2811-2822. \nPavlov, I. P. (1927). Conditioned Reflexes, Oxford University Press. \n\nSteinmetz,  J.  E.,  Lavond,  D.  G.  and  Thompson,  R.  F.  (1989).  Classical  conditioning  in \nrabbits  using  pontine  nucleus  stimulation  as  a  conditioned  stimulus  and  inferior  olive \nstimulation as an unconditioned stimulus. Synapse 3: 225-233. \nSutton,  R.  S.  and  Barto,  A.  G.  (1990).  Time  derivate  models  of  Pavlovian  Reinforcement \nLearning and Computational Neuroscience: Foundations of Adaptive Networks., MIT press: \nchapter 12, 497-537. \n\nThompson,  R.  F.,  Thompson,  J.  K.,  Kim,  J.  J.  and  Shinkman,  P.  G.  (1998).  The  nature  of \nreinforcement in cerebellar learning. Neurobiology of Learning and Memory 70: 150-176. \nVerschure, P. F. M. J. and Mintz, M. (2001). A real-time model of the cerebellar circuitry \nunderlying  classical  conditioning:  A  combined  simulation  and  robotics  study. \nNeurocomputing 38-40: 1019-1024. \n \n\n\n\n\n\n \n\n\f\n", "award": [], "sourceid": 2703, "authors": [{"given_name": "Constanze", "family_name": "Hofstoetter", "institution": null}, {"given_name": "Manuel", "family_name": "Gil", "institution": null}, {"given_name": "Kynan", "family_name": "Eng", "institution": null}, {"given_name": "Giacomo", "family_name": "Indiveri", "institution": null}, {"given_name": "Matti", "family_name": "Mintz", "institution": null}, {"given_name": "J\u00f6rg", "family_name": "Kramer", "institution": null}, {"given_name": "Paul", "family_name": "Verschure", "institution": null}]}