{"title": "Bit-Serial Neural Networks", "book": "Neural Information Processing Systems", "page_first": 573, "page_last": 583, "abstract": null, "full_text": "573 \n\nBIT - SERIAL NEURAL  NETWORKS \n\nAlan F.  Murray,  Anthony V . W.  Smith  and Zoe F.  Butler. \n\nDepartment of Electrical Engineering,  University of Edinburgh, \n\nThe King's Buildings, Mayfield Road,  Edinburgh, \n\nScotland,  EH93JL. \n\nABSTRACT \n\nA  bit  - serial  VLSI  neural  network  is  described  from  an  initial  architecture  for  a \nsynapse array through to silicon layout and board design.  The issues surrounding bit \n- serial  computation,  and  analog/digital  arithmetic  are  discussed  and  the  parallel \ndevelopment  of  a  hybrid  analog/digital  neural  network  is  outlined.  Learning  and \nrecall  capabilities  are  reported  for  the  bit  - serial  network  along  with  a  projected \nspecification  for  a  64  - neuron,  bit  - serial  board  operating  at 20 MHz.  This tech(cid:173)\nnique  is  extended  to  a  256  (2562  synapses)  network  with  an  update  time  of 3ms, \nusing  a  \"paging\"  technique  to  time  - multiplex  calculations  through  the  synapse \narray. \n\n1. INTRODUCTION \n\nThe functions a  synthetic neural network may aspire to mimic are the ability to con(cid:173)\nsider  many  solutions  simultaneously,  an  ability  to  work  with  corrupted  data  and  a \nnatural  fault  tolerance.  This  arises  from  the  parallelism  and  distributed  knowledge \nrepresentation  which  gives  rise  to  gentle  degradation  as  faults  appear.  These func(cid:173)\ntions  are  attractive  to implementation  in VLSI  and  WSI.  For example,  the natural \nfault  - tolerance  could  be  useful  in  silicon  wafers  with  imperfect  yield,  where  the \nnetwork  degradation  is  approximately  proportional  to  the  non-functioning  silicon \narea. \nTo cast  neural networks in engineering language,  a  neuron is a  state machine that is \neither  \"on\"  or  \"off',  which  in  general  assumes  intermediate  states  as  it  switches \nsmoothly  between  these  extrema.  The  synapses  weighting  the  signals  from  a \ntransmitting neuron  such that it is more or less excitatory or inhibitory to the receiv(cid:173)\ning  neuron.  The  set  of synaptic weights  determines  the stable  states and  represents \nthe learned  information in a system. \nThe  neural  state,  VI'  is  related  to  the  total  neural  activity  stimulated  by  inputs  to \nthe  neuron  through  an  activation junction,  F.  Neural  activity  is  the  level  of excita(cid:173)\ntion  of the  neuron  and the  activation  is  the way  it  reacts  in a  response to a  change \nin activation. The neural output state at time t, V[,  is related to x[ by \n\nV[  = F (xf) \n\n(1) \n\nThe  activation  function  is  a  \"squashing\"  function  ensuring  that  (say)  Vi  is  1  when \nXi  is large  and  -1  when Xi  is  small.  The neural update function  is therefore straight(cid:173)\nforward: \n\n. \n\n,+1  - ,   + ~  ~ T  V' \nJ \nXI \n\ni-n-l \n0  ~  ii \n\n- XI \n\n\u2022 \u2022\u2022\u2022\u2022 \n\nJ-O \n\n(2) \n\nwhere  8  represents  the  rate  of change  of neural  activity,  Tij \nand n  is  the number of terms giving an n  - neuron array [1]. \nAlthough  the  neural function  is  simple  enough,  in  a  totally  interconnected  n  - neu(cid:173)\nron  network  there  are n 2  synapses requiring n 2  multiplications  and  summations and \n\nis  the  synaptic  weight \n\n\u00a9 American Institute of Physics 1988 \n\n\f574 \n\na large number of interconnects.  The challenge in VLSI is therefore to design a  sim(cid:173)\nple,  compact  synapse  that  can  be  repeated  to  build  a  VLSI  neural  network  with \nIn  a  network  with  fixed  functionality,  this  is  relatively \nmanageable  interconnect. \nstraightforward.  H the  network  is to be able to learn,  however,  the synaptic weights \nmust  be programmable, and therefore more complicated. \n\n2. DESIGNING  A NEURAL  NETWORK IN  VLSI \n\nThere  are  fundamentally  two  approaches  to  implementing  any  function  in  silicon  -\ndigital and analog.  Each technique has its advantages and  disadvantages,  and these \nare  listed  below,  along  with  the  merits  and  demerits  of bit  - serial  architectures  in \ndigital (synchronous) systems. \nDigital  vs.  analog:  The  primary  advantage  of digital  design  for  a  synapse  array  is \nthat  digital  memory  is  well  understood,  and  can  be  incorporated  easily.  Learning \nnetworks are  therefore  possible  without  recourse  to unusual  techniques  or technolo(cid:173)\ngies.  Other strengths of a digital approach are that design techniques are advanced, \nautomated  and  well  understood  and  noise  immunity  and  computational  speed  can \nbe  high.  Unattractive features  are  that  digital  circuits  of this complexity need  to  be \nsynchronous  and  all  states  and  activities  are  quantised,  while  real  neural  networks \nare  asynchronous  and  unquantised.  Furthermore,  digital  multipliers  occupy  a  large \nsilicon  area, giving a low synapse count on  a single chip. \nThe  advantages  of  analog  circuitry  are  that  asynchronous  behaviour  and  smooth \nneural  activation  are  automatic.  Circuit  elements can  be  small,  but  noise  immunity \nis relatively  low  and  arbitrarily  high  precision is not  possible.  Most  importantly,  no \nreliable  analog,  non  - volatile  memory  technology  is  as  yet  readily  available.  For \nthis  reason,  learning  networks  lend  themselves  more  naturally to  digital  design  and \nimplementation. \nSeveral  groups  are  developing  neural  chips  and  boards,  and  the  following  listing \ndoes  not  pretend  to  be  exhaustive.  It is  included,  rather,  to indicate  the spread  of \nactivity  in  this  field.  Analog  techniques  have  been  used  to  build  resistor  I  opera(cid:173)\ntional  amplifier  networks [2,3]  similar to  those  proposed  by  Hopfield  and Tank [4]. \nA  large  group  at  Caltech  is  developing  networks  implementing  early  vision  and \nauditory  processing  functions  using the intrinsic nonlinearities of MaS transistors in \nthe subthreshold  regime  [5,6].  The problem of implementing analog  networks with \nelectrically  programmable  synapses  has  been  addressed  using  CCDIMNOS technol(cid:173)\nogy  [7].  Finally,  Garth  [8]  is  developing  a  digital  neural  accelerator  board  (\"Net(cid:173)\nsim\")  that  is  effectively  a  fast  SIMD  processor  with  supporting  memory  and  com(cid:173)\nmunications chips. \nBit - serial  vs.  bit  - parallel:  Bit  - serial  arithmetic and  communication  is  efficient \nfor  computational  processes,  allowing  good  communication  within  and  between \nVLSI  chips  and  tightly  pipelined  arithmetic  structures.  It  is  ideal  for  neural  net(cid:173)\nworks  as  it  minimises  the  interconnect  requirement  by  eliminating  multi  - wire \nbusses.  Although  a  bit  - parallel  design  would  be  free  from  computational  latency \n(delay  between  input  and  output),  pipelining  makes  optimal  use  of  the  high  bit  -\nrates possible in serial systems,  and  makes for  efficient circuit usage. \n2.1  An asynchronous pulse stream VLSI neural network: \nIn  addition  to  the  digital  system  that  forms  the  substance  of  this  paper,  we  are \ndeveloping  a  hybrid  analOg/digital  network  family.  This work  is  outlined  here,  and \nhas  been  reported  in  greater  detail  elsewhere  [9, 10, 11].  The  generic  (logical  and \nlayout)  architecture  of a  single  network  of n  totally  interconnected neurons is  shown \n\n\f575 \n\nschematically  in  figure  1.  Neurons  are  represented  by  circles,  which  signal  their \nstates,  Vi  upward  into  a  matrix  of  synaptic  operators.  The  state  signals  are  con(cid:173)\nnected  to  a  n  - bit  horizontal  bus  running  through  the  synaptic  array,  with  a  con(cid:173)\nnection  to  each  synaptic  operator  in  every  column.  All  columns  have  n  operators \n(denoted  by  squares)  and  each  operator adds its synaptic contribution,  Tij V j\n,  to the \nrunning  total  of  activity  for  the  neuron  i  at  the  foot  of  the  column.  The  synaptic \nfunction  is  therefore  to  multiply  the  signalling  neuron  state,  Vj\n,  by  the  synaptic \nweight,  Tij ,  and  to  add  this  product  to  the  running  total.  This  architecture  is com(cid:173)\nmon to both  the bit - serial and pulse - stream networks. \n\nSynapse \n\nStates { Vj  } \n\nFigure 1. Generic architecture for  a  network of n totally interconnected neurons. \n\nNeurons \n\nj=O \n\nj=II -1 \n\nThis type of architecture has many attractions for  implementation in 2  - dimensional \nsilicon  as  the  summation  2  Tij Vj  is  distributed  in  space.  The  interconnect \nrequirement  (n  inputs  to  each  neuron)  is  therefore  distributed  through  a  column, \nreducing the need  for  long - range wiring.  The architecture is modular,  regular and \ncan be easily expanded. \nIn  the  hybrid  analog/digital  system,  the  circuitry  uses  a  \"pulse  stream\"  signalling \nmethod  similar  to  that  in  a  natural  neural  system.  Neurons  indicate  their  state  by \nthe  presence  or  absence  of  pulses  on  their  outputs,  and  synaptic  weighting  is \nachieved  by  time  - chopping  the  presynaptic  pulse  stream  prior  to  adding  it  to  the \npostsynaptic  activity  summation.  It  is  therefore  asynchronous  and  imposes  no fun(cid:173)\ndamental  limitations  on  the  activation  or  neural  state.  Figure  2  shows  the  pulse \nstream  mechanism  in  more  detail.  The synaptic  weight  is  stored  in  digital  memory \nlocal to the operator.  Each synaptic operator has an  excitatory and inhibitory  pulse \nstream  input  and  output.  The  resultant  product  of  a  synaptic  operation,  Tij Vj\n,  is \nadded  to  the  running  total  propagating  down  either  the  excitatory  or  inhibitory \nchannel.  One binary bit  (the  MSBit)  of the  stored  Tij  determines whether  the con(cid:173)\ntribution  is excitatory or inhibitory. \nThe  incoming  excitatory  and  inhibitory  pulse  stream  inputs  to  a  neuron  are \nintegrated  to  give  a  neural  activation  potential  that varies  smoothly  from  0  to  5  V. \nThis  potential controls a  feedback  loop with  an odd number of logic  inversions and \n\n\f576 \n\n. \u2022 \u2022 \n\nXT \u2022\u2022 \n\nV , \n.u.u, \n\u2022 \n\nFigure  2.  Pulse  stream  arithmetic.  Neurons  are  denoted  by  0  and synaptic  operators \nby  D. \n\nthus  forms  a  switched  \"ring - oscillator\".  H the inhibitory input dominates,  the feed(cid:173)\nback  loop  is  broken.  H  excitatory  spikes  subsequently  dominate  at  the  input,  the \nneural activity rises  to 5V and the feedback  loop oscillates with  a period determined \nby a  delay  around  the loop.  The resultant  periodic waveform is then converted to a \nseries  of voltage  spikes,  whose  pulse  rate  represents  the  neural  state,  Vi'  Interest(cid:173)\ningly,  a  not  dissimilar  technique is  reported  elsewhere  in this volume,  although  the \nsynapse function  is executed differently [12]. \n\n3. A 5  - STATE BIT - SERIAL NEURAL  NETWORK \n\nThe  overall  architecture  of  the  5  - state  bit  - serial  neural  network  is  identical  to \nthat  of  the  pulse  stream  network.  It  is  an  array  of n 2  interconnected  synchronous \nsynaptic  operators,  and  whereas  the  pulse  stream  method  allowed  Vj  to  assume  all \nvalues  between  \"off' and  \"on\",  the  5 - state network VJ  is constrained  to 0,  \u00b10.5 Qr \n\u00b1 1.  The resultant  activation  function  is  shown  in  Figure 3.  Full  digital  multiplica(cid:173)\ntion  is  costly  in  silicon  area,  but  multiplication  of  Tij  by  Vj  =  0.5  merely  requires \nthe synaptic  weight  to be right  - shifted  by  1 bit.  Similarly,  multiplication  by  0.25 \ninvolves  a  further  right  - shift  of Til'  and  multiplication  by 0.0  is  trivially  easy.  VJ \n<  0 is not  problematic,  as  a  switchable adder/subtractor  is  not much  more complex \nthan  an  adder.  Five  neural  states  are  therefore  feasible  with  circuitry  that  is  only \nslightly more complex  than  a  simple serial adder.  The neural state expands from a  1 \nbit  to  a  3  bit  (5  - state)  representation,  where  the  bits  represent  \"add/subtract?\", \n\"shift?\" and \"multiply by O?\". \nFigure 4  shows  part of the synaptic  array.  Each synaptic operator includes an 8 bit \nshift  register  memory  block  holding  the  synaptic  weight,  Til'  A  3  bit  bus  for  the  5 \nneural  states  runs  horizontally  above  each  synaptic  row.  Single  phase  dynamic \nCMOS  has  been  used  with  a  clock  frequency  in  excess  of 20  MHz  [13).  Details of \na synaptic operator are  shown  in  figure 5.  The synaptic weight  Til  cycles around the \nshift  register  and  the  neural  state  Vj  is  present  on  the  state  bus.  During  the  first \nclock  CYCle,  the  synaptic  weight  is  multiplied  by  the  neural  state  and  during  the \nsecond,  the  most  significant  bit (MSBit)  of the resultant  Tij Vj  is sign  - extended for \n\n\f577 \n\nlHRESHOLD \n\nState VJ \n\n..... -------=-------.. Activity sJ \n\ns\u00b7 \n\n\"5  STATE\" \n\n\"Sharper\" \n\n\"Smoother\" \n\n~.....::~-\"'--x.&..t------ Activity \"J \n\nFigure 3.  \"Hard - threshold\",  5  - state and sigmoid activation functions. \n\nJ-a-1T  v \n~  ..  J \nJ-li \n\nv, \n\nv, \n\nFigure 4.  Section  of the  synaptic  array  of the  5  - state activation function  neural net(cid:173)\nwork. \n\n8  bits  to  allow  for  word  growth  in  the  running  summation.  A  least  significant  bit \n(LSBit)  signal  running down  the  synaptic  columns indicates the arrival  of the LSBit \nof  the  Xj  running  total.  If  the  neural  state  is  \u00b1O.5  the  synaptic  weight  is  right \nshifted  by  1 bit and then added to or subtracted from  the running total.  A  multipli(cid:173)\ncation  of  \u00b1 1  adds  or  subtracts  the  weight  from  the  total  and  multiplication  by  0 \n\n\f578 \n\n.0.5 \n.0.0 \n\nAdd/Subtract \n\nAdd! \nSubtract \n\nCarry \n\nFigure S.  The  synaptic operator with a 5 - state activation function. \n\ndoes not alter the running summation. \nThe  final  summation  at  the  foot  of the  column  is  thresholded  externally  according \nto  the  5  - state activation function  in  figure  3.  As  the  neuron activity Xj'  increases \nthrough  a  threshold  value  x\" \nideal  sigmoidal  activation  represents  a  smooth  switch \nof  neural  state  from  -1  to  1.  The 5  - state  \"staircase\"  function  gives a  superficially \nmuch  better  approximation  to  the  sigmoid  form  than  a  (much  simpler  to  imple(cid:173)\nment)  threshold  function.  The  sharpness  of  the  transition  can  be  controlled  to \n\"tune\"  the  neural dynamics for  learning and computation.  The control parameter is \nreferred  to  as  temperature  by  analogy  with  statistical  functions  with  this  sigmoidal \nform.  High  \"temperature\" gives a  smoother staircase and sigmoid,  while a tempera(cid:173)\nture  of  0  reduces  both  to  the  ''Hopfield''  - like  threshold  function.  The  effects  of \ntemperature  on  both  learning  and  recall  for  the  threshold  and  5  - state  activation \noptions are discussed in section 4. \n\n4. LEARNING AND  RECALL  WITH VLSI  CONSTRAINTS \n\nBefore  implementing  the  reduced  - arithmetic  network  in  VLSI,  simulation  experi(cid:173)\nments  were  conducted  to  verify  that  the  5  - state  model  represented  a  worthwhile \nenhancement  over  simple  threshold  activation.  The  \"benchmark\"  problem  was \nchosen  for  its  ubiquitousness,  rather  than  for  its  intrinsic  value.  The  implications \nfor  learning  and  recall  of the  5  - state  model,  the  threshold  (2  - state)  model  and \n- state)  were  compared  at  varying  temperatures \nsmooth  sigmoidal  activation  (  00 \nIn  each  simulation  a  totally \nwith  a  restricted  dynamic  range  for  the  weights  Tij \u2022 \ninterconnected  64  node  network  attempted  to  learn  32  random  patterns  using  the \ndelta  rule  learning  algorithm  (see  for  example  [14]).  Each  pattern  was  then  cor(cid:173)\nrupted  with  25%  noise  and  recall  attempted  to  probe  the  content  addressable \nmemory properties under the three different activation options. \nDuring  learning,  individual  weights  can  become  large  (positive  or  negative).  When \nweights  are  \"driven\"  beyond  the  maximum  value  in  a  hardware  implementation, \n\n\f579 \n\nwhich  is  determined  by  the  size  of  the  synaptic  weight  blocks,  some  limiting \nmechanism  must  be  introduced.  For  example,  with  eight  bit  weight  registers,  the \nlimitation is  -128  S  Tij  S  127.  With integer weights,  this can be seen to be a prob(cid:173)\nlem  of  dynamic  range,  where  it  is  the  relationship  between  the  smallest  possible \nweight  (\u00b1 1) and the largest  (+ 127/-128) that is the issue. \nResults:  Fig.  6  shows  examples  of the  results  obtained,  studying  learning  using  5  -\nstate  activation  at  different  temperatures,  and  recall  using  both  5  - state  and  thres(cid:173)\nhold  activation.  At  temperature  T=O,  the  5  - state  and  threshold  models  are \ndegenerate,  and  the results identical.  Increasing smoothness of activation  (tempera(cid:173)\nture)  during  learning  improves  the  quality  of  learning  regardless  of  the  activation \nfunction  used  in  recall,  as more patterns are recognised  successfully.  Using 5 - state \nactivation  in recall  is more effective  than simple  threshold  activation.  The effect of \ndynamic  range  restrictions  can  be  assessed  from  the  horizontal  axis,  where  T/j:6.  is \nshown.  The results  from  these and  many  other experiments may  be  summarised  as \nfollows:-\n5 - State activation  vs.  threshold: \n1)  Learning with 5  - state activation was  protracted  over the threshold  activation, \nas  binary  patterns  were  being  learnt,  and  the  inclusion  of  intermediate  values \nadded extra degrees of freedom. \n\n2)  Weight  sets  learnt  using  the  5  - state  activation  function  were  \"better\"  than \nthose  learnt  via  threshold  activation,  as  the  recall  properties  of both  5  - state \nand  threshold  networks  using  such  a  weight  set  were  more  robust  against \nnoise. \nFull  sigmoidal  activation  was  better  than  5  - state,  but  the  enhancement  was \nless  significant  than  that  incurred  by  moving  from  threshold  - 5 - state.  This \nsuggests  that the law  of diminishing returns  applies to  addition of levels to the \nneural  state  Vi'  This  issue  has  been  studied  mathematically  [15],  with  results \nthat agree  qualitatively with  ours. \n\n3) \n\nWeight Saturation: \nThree  methods  were  tried  to  deal  with  weight  saturation.  Firstly,  inclusion  of  a \ndecay,  or  \"forgetting\"  term  was  included  in  the  learning  cycle  [1].  It  is  our  view \nthat  this  technique can  produce the desired weight limiting property,  but in  the time \navailable  for  experiments,  we  were  unable  to  \"tune\"  the  rate  of  decay  sufficiently \nwell  to  confirm  it.  Renormalisation  of the  weights  (division  to  bring large  weights \nback  into  the  dynamic  range)  was  very  unsuccessful,  suggesting  that  information \ndistributed  throughout  the  numerically small  weights  was  being  destroyed.  Finally, \nthe  weights were  allowed  to  \"clip\"  (ie any weight  outside the dynamic range  was  set \nto  the  maximum  allowed  value).  This method  proved  very  successful,  as  the learn(cid:173)\ning  algorithm  adjusted the weights  over which  it still  had control  to  compensate for \nthe  saturation effect.  It is  interesting to note  that  other experiments have indicated \nthat  Hopfield  nets  can  \"forget\"  in a  different  way,  under different learning control, \ngiving  preference  to  recently acquired  memories [16].  The results  from  the  satura(cid:173)\ntion experiments were:-\n1) \n\nFor  the  32  pattemJ64  node  problem,  integer  weights  with  a  dynamic  range \ngreater than  \u00b130 were necessary to give enough  storage capability. \nFor weights  with  maximum  values  TiJ  = 50-70,  \"clipping\"  occurs,  but  net(cid:173)\nwork  performance  is  not  seriously  degraded  over  that  with  an  unrestricted \nweight set. \n\n2) \n\n\f580 \n\n15 \n\n\"0  10 \nc = \n.2 \nen e u \n5 --~ \n\n0 \n\n0 \n\nI \n\n\".' \n\n., ... \n\n.... ----------\n\n,-\ne  ~ ;A ....... ;.. f:'-:' :::::7.:::.::-:::-: f'-. \n,  ,. \ni \n! \n! , \ni \nI \nI , \n\n20  30 \n\n40  50  60  70 \n\nLimit \n\n15 \n\nT=30  _._.-.-\nT=20 \nT=10 \nT=O \n\n-.-._.-.. \n\n,.. .\u2022. -..... -.\u2022. _ .\u2022. \n, \n.. \ni \nj''''--\n,,'i \n\n- . . .,. '\" \n\nj \n\n~-------------\n\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \u2022\u2022\u2022\u2022\u2022\u2022 \n\nj \nI \n\nO~~~~--~~ __ ~~ __ \no \n\n20  30  40  50  60  70 \n\nLimit \n\n5 . state activation function  recal1 \n\ntlHopficld\" activation  function  recall \n\nFigure 6.  Recall  of patterns  learned  with  the  5  .  state  activation function  and  subse(cid:173)\nquently restored using  the 5-state and the  hard - threshold activation functions. \nT  is  the  \"temperature\",  or smoothness  of the  activation function,  and \"limit\"  the  value \nofTI; \u00b7  \n\nThese  results  showed  that  the  5  - state  model  was  worthy  of implementation  as  a \nVLSI neural board, and suggested that 8 - bit weights were sufficient. \n\nS.  PROJECTED SPECIFICATION OF A HARDWARE NEURAL  BOARD \n\nThe specification of a  64  neuron board is  given  here,  using a  5 - state bit  - serial 64 \nx 64  synapse array with  a derated clock speed  of 20 MHz.  The synaptic weights are \n8  bit words and the word  length  of the running summation XI  is  16  bits to  allow for \ngrowth.  A  64  synapse  column  has  a  computational  latency  of  80  clock  cycles  or \nbits,  giving  an  update  time  of 4 .... s  for  the  network.  The  time  to  load  the  weights \ninto  the  array  is  limited  to  6O .... s  by  the  supporting  RAM,  with  an  access  time  of \n12Ons.  These  load  and  update  times  mean  that  the  network  is  executing  1  x  10' \noperations/second,  where  one  operation  is  \u00b1  Tlj  Vj \u2022  This  is  much  faster  than  a \nnatural  neural  network,  and  much  faster  than  is  necessary  in  a  hardware  accelera(cid:173)\ntor.  We  have  therefore  developed  a  \"paging\"  architecture,  that  effectively  \"trades -\noff\" some of this excessive speed against increased network size. \nA  \"moving  - patch\"  neural  board:  An  array  of  the  5  - state  synapses  is  currently \nbeing  fabricated  as  a  VLSI  integrated  circuit.  The  shift  registers  and \nthe \nadderlsubtractor for  each  synapse  occupy a  disappointingly large silicon  area,  allow(cid:173)\ning only a  3  x 9 synaptic  array.  To achieve  a  suitable size  neural  network  from  this \narray,  several chips need to be  included on a  board with  memory and control circu(cid:173)\nitry.  The  \"moving  patch\"  concept  is  shown  in  figure  7,  where  a  small  array  of \nsynapses is passed over a much larger n  x n  synaptic array. \nEach  time  the  array  is  \"moved\"  to  represent  another set  of  synapses,  new  weights \nmust be  loaded  into it.  For example,  the  first  set of weights will  be T 11  \u2022. ,  T;J  ... T 21 \n...  T 2j  to Tjj ,  the second  set  Tj + 1,l  to T u  etc..  The final  weight  to be loaded will  be \n\n\f581 \n\nn  neurons .. om synaptic array \n\nSmaller \"Patch\" \n\nmoves over array \n\nrr~ _____ ) __ -.. \n> \n~'-\n\nFigure 7.  The  \"moving  patch\" concept,  passing  a  small synaptic \"patch\"  over  a larger \nrun synapse array. \n\nTNt\u00b7  Static,  off - the  - shelf RAM is  used  to store the weights and the  whole opera(cid:173)\ntion  is  pipelined for  maximum efficiency.  Figure 8 shows the board level design for \nthe network. \n\nSynaptic  Accelerator Chips \n\nControl \n\nHOST \nFigure 8. A  \"moving  patch\" neural network board. \n\nThe small  \"patch\" that moves  around  the array  to  give  n  neurons comprises 4 VLSI \nsynaptic accelerator chips to give  a 6 x 18 synaptic array. The number of neurons to \nbe  simulated  is 256  and  the weights for  these  are stored  in 0.5  Mb of RAM  with a \nload  time  of 8ms.  For  each  \"patch\"  movement,  the  partial  runnin~ summatinn \n\n;. \n\n\f582 \n\ncalculated  for  each  column,  is  stored  in  a  separate  RAM  until  it is  required  to  be \nadded  into  the  next  appropriate  summation.  The  update  time  for  the  board  is  3ms \ngiving  2  x  107  operations/second.  This  is  slower  than  the  64  neuron  specification, \nbut  the  network  is  16  times  larger,  as  the  arithmetic  elements are  being  used  more \nefficiently.  To  achieve  a  network  of  greater  than  256  neurons,  more  RAM  is \nrequired to store the weights.  The network is then slower unless a larger number of \naccelerator chips is  used  to give  a larger moving \"patch\". \n\n6.  CONCLUSIONS \n\nA  strategy  and  design  method  has  been  given  for  the  construction  of  bit  - serial \nVLSI neural network chips and  circuit  boards.  Bit - serial  arithmetic,  coupled  to  a \nreduced  arithmetic  style,  enhances  the  level  of  integration  possible  beyond  more \nconventional digital,  bit - parallel schemes.  The restrictions imposed  on both synap(cid:173)\ntic  weight  size  and  arithmetic  precision  by  VLSI  constraints  have  been  examined \nand shown to be tolerable,  using the associative memory problem as a test. \nWhile  we  believe  our  digital  approach  to  represent  a  good  compromise  between \narithmetic  accuracy  and  circuit  complexity,  we  acknowledge  that  the  level  of \nintegration  is  disappointingly  low. \nIt  is  our  belief  that,  while  digital  approaches \nmay  be interesting and  useful  in the medium  term,  essentially as  hardware accelera(cid:173)\ntors for  neural simulations,  analog techniques represent the best  ultimate option in 2 \n- dimensional  silicon.  To this  end,  we  are currently pursuing techniques for  analog \nIn any  event,  the  full \npseudo  - static  memory,  using  standard  CMOS  technology. \ndevelopment  of a  nonvolatile  analog  memory  technology,  such  as  the  MNOS  tech(cid:173)\nnique [7],  is key to the long - term  future of VLSI neural nets that can learn. \n\n7. ACKNOWLEDGEMENTS \n\nThe  authors  acknowledge  the  support  of  the  Science  and  Engineering  Research \nCouncil (UK) in the execution of this work. \n\nReferences \n\n1. \n\nS.  Grossberg,  \"Some  Physiological  and  Biochemical  Consequences  of Psycho(cid:173)\nlogical Postulates,\" Proc.  Natl.  Acad.  Sci.  USA,  vol.  60,  pp.  758  - 765,  1968. \n\n2.  H.  P.  Graf,  L.  D.  Jackel,  R.  E.  Howard,  B.  Straughn,  J.  S.  Denker,  W. \nHubbard,  D.  M.  Tennant,  and  D.  Schwartz,  \"VLSI  Implementation  of  a \nNeural  Network  Memory  with  Several  Hundreds  of  Neurons,\"  Proc.  AlP \nConference on Neural Networks for  Computing.  Snowbird,  pp.  182 - 187,  1986. \n3.  W.  S.  Mackie,  H.  P.  Graf,  and  J.  S.  Denker,  \"Microelectronic  Implementa(cid:173)\n\ntion  of  Connectionist  Neural  Network  Models,\"  IEEE  Conference  on  Neural \nInformation Processing Systems.  Denver,  1987. \nJ . J. Hopfield  and D.  W.  Tank, \"Neural\" Computation of Decisions in  Optim(cid:173)\nisation Problems,\" BioI.  Cybern.,  vol.  52,  pp.  141  - 152,  1985. \n\n4. \n\n5.  M.  A.  Sivilotti,  M.  A.  Mahowald,  and  C.  A.  Mead, Real - Time  Visual Com(cid:173)\n\nputations Using  Analog CMOS  Processing Arrays, 1987.  To be published \n\n6.  C.  A.  Mead,  \"Networks  for  Real  - Time  Sensory  Processing,\"  IEEE  Confer(cid:173)\n\nence  on  Neural Information  Processing Systems,  Denver,  1987. \n\n\f583 \n\n7. \n\n8. \n\nJ.  P.  Sage,  K.  Thompson.  and  R. S.  Withers,  \"An Artificial Neural  Network \nIntegrated  Circuit  Based on MNOSlCCD  Principles,\"  Proc. AlP Conference on \nNeural Networlcs for Computing,  Snowbird,  pp.  381  - 385,  1986. \nS.  C.  J.  Garth, \"A Chipset for  High Speed  Simulation of Neural Network  Sys(cid:173)\ntems,\"  IEEE Conference on Neural Networlc.s,  San Diego,  1987. \n\n9.  A.  F.  Murray and  A.  V.  W.  Smith,  \"A Novel  Computational  and  Signalling \nMethod  for  VLSI Neural Networks,\"  European  Solid State Circuits Conference \n, 1987. \n\n10.  A.  F.  Murray  and  A.  J.  W.  Smith,  \"Asynchronous  Arithmetic  for  VLSI \n\nNeural Systems,\"  Electronics Letters, vol.  23, no.  12, p.  642, June, 1987. \n\n11.  A.  F.  Murray  and  A.  V.  W.  Smith,  \"Asynchronous  VLSI  Neural  Networks \n\nusing  Pulse  Stream  Arithmetic,\"  IEEE  Journal  of Solid-State  Circuits  and Sys(cid:173)\ntems,  1988.  To be published \n\n12.  M.  E.  Gaspar,  \"Pulsed  Neural  Networks:  Hardware,  Software  and  the  Hop(cid:173)\nfield  AID  Converter  Example,\"  IEEE  Conference  on  Neural  Information  Pro(cid:173)\ncessing Systems.  Denver,  1987. \n\n13.  M.  S.  McGregor,  P.  B.  Denyer,  and A.  F.  Murray,  \"A Single - Phase  Clock(cid:173)\ning Scheme for  CMOS  VLSI,\"  Advanced Research  in  VLSI  \" Proceedings of the \n1987 Stanford Conference,  1987. \n\n14.  D.  E.  Rumelhart,  G.  E.  Hinton,  and  R.  J.  Williams,  \"Learning  Internal \nRepresentations  by  Error  Propagation,\"  Parallel  Distributed  Processing  \" \nExplorations  in  the  Microstructure of Cognition,  vol.  1,  pp.  318 - 362,  1986. \n\n15.  M.  Fleisher  and  E.  Levin,  \"The  Hopfiled  Model  with  Multilevel  Neurons \nModels,\"  IEEE  Conference  on  Neural  Information  Processing  Systems.  Denver, \n1987. \n\n16.  G.  Parisi,  \"A  Memory  that  Forgets,\"  J.  Phys.  A  .'  Math.  Gen.,  vol.  19,  pp. \n\nL617  - L620,  1986. \n\n\f", "award": [], "sourceid": 27, "authors": [{"given_name": "Alan", "family_name": "Murray", "institution": null}, {"given_name": "Anthony", "family_name": "Smith", "institution": null}, {"given_name": "Zoe", "family_name": "Butler", "institution": null}]}