{"title": "Introduction to a System for Implementing Neural Net Connections on SIMD Architectures", "book": "Neural Information Processing Systems", "page_first": 804, "page_last": 813, "abstract": null, "full_text": "804 \n\nINTRODUCTION TO A  SYSTEM FOR IMPLEMENTING NEURAL  NET \n\nCONNECTIONS ON SIMD  ARCHITECTURES \n\nSherryl Tomboulian \n\nInstitute for  Computer Applications  in Science and Engineering \n\nNASA  Langley Research Center, Hampton VA  23665 \n\nABSTRACT \n\nNeural  networks  have  attracted  much  interest  recently,  and  using  parallel \n\narchitectures  to simulate  neural  networks  is  a  natural  and  necessary  applica(cid:173)\ntion.  The  SIMD  model  of parallel  computation is  chosen,  because systems  of \nthis  type  can  be  built  with  large  numbers  of processing  elements.  However, \nsuch systems are not naturally suited to generalized communication.  A method \nis  proposed  that  allows  an implementation of neural  network  connections  on \nmassively parallel SIMD  architectures.  The key to this system is an algorithm \nthat allows  the formation  of arbitrary  connections  between  the \"neurons\".  A \nfeature  is  the ability  to add  new  connections  quickly.  It also  has error  recov(cid:173)\nery  ability and  is  robust  over a  variety  of network  topologies.  Simulations  of \nthe general  connection system, and its implementation on the Connection Ma(cid:173)\nchine,  indicate  that  the  time and  space  requirements  are  proportional  to  the \nproduct of the  average number of connections per neuron and the diameter of \nthe interconnection network. \n\nINTRODUCTION \n\nNeural Networks hold  great promise for biological research, artificial intelli(cid:173)\n\ngence,  and even as  general computational devices.  However,  to study systems \nin  a  realistic  manner,  it  is  highly  desirable  to  be  able  to  simulate  a  network \nwith tens of thousands or hundreds of thousands of neurons.  This suggests the \nuse  of parallel  hardware.  The most  natural  method  of exploiting  parallelism \nwould have each processor simulating a  single neuron. \n\nConsider the requirements of such a  system.  There should  be a  very  large \nnumber of processing  elements which  can work  in  parallel.  The computation \nthat occurs at these elements is simple and based on local data.  The processing \nelements must be able to have connections to other elements.  All  connections \nin  the  system must  be  able  to  be  traversed  in parallel.  Connections  must  be \nadded and deleted dynamically. \n\nGiven current technology,  the only type of parallel model that can be con(cid:173)\n\nstructed  with  tens  of thousands  or hundreds of thousands  of processors  is  an \nSIMD  architecture.  In exchange for being able to build a system with so many \nprocessors, there are some inherent limitations.  SIMD stands for single instruc(cid:173)\ntion  multiple datal  which means  that  all  processors  can  work  in  parallel,  but \nthey  must  do  exactly  the same  thing at the  same  time.  This machine  model \nis  sufficient  for  the  computation required  within  a  neuron,  however  in  such  a \nsystem it is difficult  to implement arbitrary connections between neurons.  The \nConnection Machine2  provides such a model, but uses a device called the router \n\nThis work  was  supported  by  the  National  Aeronautics  and  Space  Administration  under \n\nNASA  Constract No.  NASl-18010-7 while  the author was in residence  at ICASE. \n\n\u00a9 American Institute of Physics 1988 \n\n\f805 \n\nto deliver messages.  The router is a complex piece of hardware that uses signif(cid:173)\nicant chip area, and without the additional hardware for the router, a  machine \ncould be built with significantly more processors.  Since one of the objectives is \nto maximize the number of \"neurons\"  it is  desirable to eliminate the extra cost \nof a  hardware router and instead use  a software method. \n\nExisting  software  algorithms  for  forming  connections  on  SIMD  machines \nare not sufficient for  the requirements of a  neural networks.  They restrict  the \nform  of graph  (neural  network)  that  can  be  embedded  to  permutations!\u00b7\u00b7  or \nsorts5.6combinedwith7,  the methods are network specific, and adding a  new connec(cid:173)\ntion  is highly  time consuming. \n\nThe software routing method presented here is a unique algorithm which al(cid:173)\n\nlows arbitrary neural networks to be embedded in machines with a wide variety \nof network  topologies.  The advantages of such an approach are  numerous:  A \nnew connection can be added  dynamically  in the same amount of time that it \ntakes to perform a  parallel  traversal  of all  connections.  The method has error \nrecovery ability in case of network failures.  This method has relationships with \nnatural neural models.  When a new connection is to be formed,  the two neurons \nbeing connected are activated, and then the system forms  the connection with(cid:173)\nout any knowledge of the \"address\"  of the neuron-processors  and without  any \ninstruction as to the method of forming  the connecting path.  The connections \nare entirely distributed;  a  processor only knows  that connections pass  through \nit - it doesn't  know a  connection's origin or final  destination. \n\nSome neural network applications have been implemented on massively par(cid:173)\n\nallel  architectures,  but  they have run  into restrictions  due  to  communication. \nAn  implementation on  the  Connection  Machines  discovered  that it  was  more \ndesirable  to  cluster processors  in  groups,  and  have  each  processor  in  a  group \nrepresent one connection, rather than having one processor per neuron, because \nthe router is designed to deliver one message at a time from each processor.  This \napproach is  contrary with  the more natural  paradigm of having one  processor \nrepresent a  neuron.  The MPP  9,  a  massively parallel architecture with proces(cid:173)\nsors arranged in a mesh, has been used to implement neural nets10,  but because \nof a  lack of generalized  communication software,  the method for  edge  connec(cid:173)\ntions  is  a  regular  communication  pattern  with  all  neurons  within  a  specified \ndistance.  This is not an unreasonable approach, since within the brain neurons \nare  usually  locally  connected,  but  there  is  also  a  need  for  longer  connections \nbetween  groups  of  neurons.  The  algorithms  presented  here  can  be  used  on \nboth  machines  to facilitate  arbitrary connections  with  an irregular  number of \nconnections at each processor. \n\nMACHINE MODEL \n\nAs  mentioned  previously,  since  we  desire  to  build  a  system  with  an  large \n\nnumber of processing elements, the only technology currently available for build(cid:173)\ning  such  large  systems  is  the  SIMD  architecture  model.  In  the  SIMD  model \nthere is  a  single control unit and  a  very large  number of slave processors  that \ncan execute the same instruction stream simultaneously.  It is possible to disable \nsome processors so that only some execute an instruction, but it is not possible \nto have two processor performing different instructions at the same time.  The \nprocessors  have exclusively local  memory which is  small  (only  a  few  thousand \nbits),  and  they  have  no  facilities  for  local  indirect  addressing.  In this  scheme \nan Instruction involves both a  particular operation code and the local memory \n\n\f806 \n\naddress.  All processors must do this same thing to the same areas of their local \nmemory at the same time. \n\nThe basic model of computation is  bit-serial - each instruction operates on \na  bit at a  time.  To perform multiple bit operations,  such as  integer addition, \nrequires  several  instructions.  This  model  is  chosen  because  it  requires  less \nhardware logic,  and so would allow a machine to be built with a  larger number \nof processors than could  otherwise  be achieved  with a  standard word-oriented \napproach.  Of course, the algorithms presented here will also work for machines \nwith more complex instruction abilities;  the machine model described satisfies \nthe minimal requirements. \n\nAn important requirement for  connection formation  is  that the processors \n\nare  connected  in  some  topology.  For  instance,  the  processors  might  be  con(cid:173)\nnected  in  a  grid  so  that  each  processor  has  a  North,  South,  East,  and  West \nneighbor.  The  methods  presented  here  work  for  a  wide  variety  of  network \ntopologies.  The  requirements  are:  (1)  there  must  be some  path  between any \ntwo  proeessors;  (2)  every  neighbor )ink  must  be  bi-directional,  i.e. \nif A  is  a \nneighbor  of  B,  then  B  must  be  a  neighbor  of  A;  (3)  the  neighbor  relations \nbetween  processors  must  have  a  consistent  invertible  labeling.  A  more  pre(cid:173)\ncise  definition  of the  labeling requirements can be found  in 11.  It suffices  that \nmost  networks  12,  including  grid,  hypercube,  cube  connected  cycles1S,  shuffle \nexchange14 ,  and  mesh  of trees15  are  admissible  under the scheme.  Additional \nrequirements  are  that  the  processors  be  able  to  read  from  or  write  to  their \nneighbors'  memories,  and  that  at  least  one  of  the  processors  acts  as  a  serial \nport  between the processors and the controller. \n\nCOMPUTATIONAL REQUIREMENTS \n\nThe  machine  model  described  here  is  sufficient  for  the  computational  re(cid:173)\n\nquirements of a neuron.  Adopt the paradigm that each processor represents one \nneuron.  While  several  different  models  of neural  networks  exist  with  slightly \ndifferent features,  they are all fairly  well  characterized by computing a  sum or \nproduct  of the  neighbors  values,  and  if a  certain  threshold  is  exceeded,  then \nthe processor neuron will  fire,  Le.  activate other neurons.  The machine model \ndescribed  here  is  more efficient  at  boolean computation,  such as  described  by \nMcCulloch and  Pitts16,  since it is  bit serial.  Neural  net  models  using  integers \nand  floating  point arithmetic  17,18  will  also work  but will  be somewhat  slower \nsince  the  time  for  computation  is  proportional  to  the  number  of bits  of the \noperands. \n\nThe only computational difficulty  lies  in the fact  that the system is  SIMD, \nwhich means that the processes are synchronous.  For some neural  net  models \nthis  is  sufficient18  however others  require  asynchronous behavior  17.  This  can \neasily be achieved simply by turning the processors on and off based on a spec(cid:173)\nified  probability distribution.  (For a  survey of some different  neural  networks \nsee 19). \n\nCONNECTION ASSUMPTIONS \n\nMany  models  of  neural  networks  assume  fully  connected  systems.  This \nmodel is considered unrealistic, and the method presented here will work better \nfor  models  that  contain more  sparsely  connected  systems.  While  the  method \nwill work for dense connections, the time and space required is  proportional to \n\n\f807 \n\nthe number of edges,  and becomes prohibitively expensive. \n\nOther  than the  sparse  assumptions,  there  are  no  restrictions  to  the topo(cid:173)\n\nlogical  form  of  the  network  being  simulated.  For  example,  multiple  layered \nsystems,  slightly irregular structures,  and  completely  random connections are \nall  handled  easily.  The system does  function  better if there  is  locality  in  the \nneural network.  These assumptions seem to fit  the biological model of neurons. \n\nTHE CONNECTION FORMATION METHOD \n\nA fundamental part of a neural network implementation is the realization of \n\nthe connections between neurons.  This is done using a software scheme first pre(cid:173)\nsented in  11,20.  The original  method was intended for  realizing directed  graphs \nin  SIMD  architectures.  Since  a  neural  network  is  a  graph  with  the  neurons \nbeing  vertices  and  the  connections  being  arcs,  the  method  maps  perfectly  to \nthis  system.  Henceforth  the  terms  neuron  and  vertex  and  the  terms  arc  and \nconnection will  be used  interchangeably. \n\nThe software  system presented here for  implementing  the connections  has \nseveral parts.  Each processor will  be assigned  exactly  one  neuron.  (Of course \nsome  processors  may  be  \"free\"  or unallocated,  but even  \"free\"  processor  par(cid:173)\nticipate  in  the  routing  process.)  Each  connection  will  be  realized  as  a  path \nin  the  topology  of processors.  A  labeling  of these  paths in  time  and  space  is \nintroduced  which  allows  efficient  routing  algorithms  and  a  set-up  strategy  is \nintroduced  that allows new connections to be added quickly. \n\nThe standard computer science approach to forming  the connection would \nbe to store the addresses of the processors to which a given neuron is connected. \nThen,  using  a  routing  algorithm,  messages  could  be  passed  to  the  processors \nwith  the specified  destination.  However,  the SIMD  architecture does  not  lend \nitself to standard  message  passing schemes because  processors  cannot do indi(cid:173)\nrect addressing, so buffering of values  is  difficult and costly. \n\nInstead, a scheme is introduced which is closer to the natural neuron-synapse \nstructures.  Instead  of having an address  for  each  connection,  the  connection \nis  actually represented as  a  fixed  path between the processors, using time as a \nvirtual  dimension.  The path a  connection  takes  through  the  network  of  pro(cid:173)\ncessors is statically encoded in the local  memories of the neurons that it passes \nthrough.  To achieve  this,  the following  data structures will  be resident at each \nprocessor. \n\nALLOCATED  ---- boolean  flag  indicating \nwhether  this  processor  is  assigned \na  vertex  (neuron)  in  the  graph \n\nVERTEX  LABEL  --- label  of  graph  vertex \nHAS_NEIGHBOR[l .. neighbor_limit]  flag \n\nindicating  the  existence  of  neighbors \narc  path  information \n\nSLOTS[l .. T]  OF \n\n(neuron) \n\nSTART----------new  arc  starts  here \nDIRECTION------direction  to  send \n\n{l .. neighbor_limit.FREE} \n\nEND-----------arc  ends  here \nARC  LABEL-----label  of  arc \n\n\f808 \n\nThe ALLOCATED and VERTEX LABEL field  indicates that the processor \nhas been assigned a  vertex in the graph  (neuron).  The HAS  NEIGHBOR field \nis used  to indicate whether a  physical wire exists in the particular direction;  it \nallows  irregular network topologies  and  boundary conditions  to  be supported. \nThe  SLOTS  data structure is  the  key  to  realizing  the  connections.  It is  used \nto instruct the processor where to send a  message and to insure that paths are \nconstructed in such a  way that no collisions will  occur. \n\nSLOTS is an array with T elements.  The value T  is called the time quantum. \nTraversing all  the edges  of the embedded  graph  in  parallel  will  take  a  certain \namount  of time  since  messages  must  be  passed  along  through  a  sequence  of \nneighboring  processors.  Forming  these parallel  connections will  be considered \nan uninterruptable operation which will take T  steps.  The SLOTS array is used \nto tell the processors what they should do on each relative time  position within \nthe time quantum. \n\nOne of the characteristics of this algorithm is  that a  fixed  path is  chosen to \nrepresent  the connection  between  two  processors,  and  once  chosen  it  is  never \nchanged.  For example, consider the grid below. \n\n--A--B--C--D--E--\n\n--F--G--H--I--J--\n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nI \n\nFig.  1.  Grid  Example \n\nIf there is  an arc  between A and H,  there are several  possible paths:  East(cid:173)\n\nEast-South,  East-South-East,  and  South-East-East.  Only  one  of these  paths \nwill  be  chosen  between  A  and  H,  and  that  same  path  will  always  be  used. \nBesides  being  invariant  in space,  paths  are  also  invariant  in  time.  As  stated \nabove,  traversal is  done  within  a  time  quantum T.  Paths do  no  have  to start \non  time  1,  but  can  be  scheduled  to  start  at  some  relative  offset  within  the \ntime quantum.  Once the starting time for  the path has  been fixed,  it is  never \nchanged.  Another requirement is  that a  message can not be buffered,  it  must \nproceed  along  the  specified  directions  without  interruption.  For  example,  if \nthe  path  is  of  length  3  and  it  starts  at  time  1,  then  it  will  arrive  at  time \n4.  Alternatively,  if it  starts  at  time  2  it will  arrive  at  time  5.  Further,  it  is \nnecessary  to place  the paths so  that no collisions occur;  that is,  no two paths \ncan  be  at  the  same  processor  at  the  same  instant  in  time.  Essentially  time \nadds an extra dimension to the topology of the network, and within this space(cid:173)\ntime network all data paths must be non-conflicting.  The rules for constructing \npaths that fulfill  these requirements are listed  below . \n\n\u2022  At  most  one  connection  can  enter a  processor  at  a  given  time,  and  at \nmost one connection can leave  a  processor at a  given time.  It is  possible \nto have both one coming and one going at the same time.  Note that this \ndoes  not mean  that a  processor can have  only  one  connection;  it  means \nthat it can have only one connection during anyone of the T  time steps. \nIt can have as many as T  connections going  through it . \n\n\u2022  Any  path  between  two  processors  (u,v)  repr('senting  a  connection  must \n\nconsist of steps at contiguous times.  For example,  if the path from  pro(cid:173)\ncessor u  to processor v  is  u,f,g,h,v,  then if the arc  from u-f is  assigned \ntime 1,  f-g  must  have time 2,  g-h time 3,  and h-v  time 4.  Likewise if u-f \noccurs  at time 5,  then arc h-v will  occur time 8. \n\n\f809 \n\nWhen these  rules  are used when forming  paths,  the SLOTS structure  can \nbe used  to mark the paths.  Each path goes  through neighboring processors  at \nsuccessive  time  steps.  For each of these  time  steps  the  DffiECTION  field  of \nthe SLOTS structure is  marked, telling the processor which direction  it should \npass a message if it receives it on that time.  SLOTS serves both to instruct the \nprocessors  how to send  messages, and to indicate that a  processor is  busy at a \ncertain time slot so  that when new paths are constructed it can be guaranteed \nthat they won't conflict with current paths. \nConsider  the following  example.  Suppose we  are  given  the  directed  graph \nwith  vertices  A,B,C,D  and  edges  A  - >  C,  B  - >  C,B  - >  D,  and  D  - > \nA.  This  is  to  be  done  where  A,B,C,  and  D  have  been  assigned  to  successive \nelements  of  a  linear  array.  (A  linear  array  in  not  a  good  network  for  this \nscheme, but is  a  convenient source of examples.) \n\nLo~ical Connections \n\nFaa.  2.  GIapb Example \n\nA.B.C.D  are  successive  members  in  a  linear  array \n\n1---2---3---4 \nA---B---C---D \n\nFirst.  A ->C  can  be  completed  with  the  map  East-East.  so \nSlots[A][1].direction  =  E.  Slots[B][2].direction=E. \nSlots[C][2].end  =  1  . \n\nB->C  can  be  done  with  the  map  East.  it  can  start  at  time  1. \nsince  Slots[B] [1] . direction  and  Slots[C] [1].end  are  free. \n\nB->D  goes  through  C then  to  D.  its  map  is  East-East.  B is \noccupied  at  time  1  and  2.  It  is  free  at  time  3. \nso  Slots[B] [3].direction  =  E.  Slots[C] [4].direction  =  E. \nSlots[D] [4].end  =  1. \n\nD->A  must  go  through  C.B.A.  using  map  West-West-West. \nD is  free  on  time  1.  C  is  free  on  time  2.  but  B is  occupied \non  time  3.  D is  free  on  time  2.  but  C  is  occupied  on  time  3. \nIt  can  start  from  D at  time  3.  Slots[D] [3].direction  =  W. \nSlots[C] [4] . direction  =  W.  Slots[B] [5].direction  =  W. \nSlots [A]  [5].end=1 \n\n\f810 \n\nEvery processor acts as a  conduit for its neighbors messages.  No  processor \nknows where any message is going to or coming from,  but each processor knows \nwhat it must do  to establish the local connections. \n\nThe  use  of  contiguous  time  slots  is  vital  to  the  correct  operation  of  the \nsystem.  If all edge-paths  are established according to the above rules,  there is \na  simple  method for  making  the connections.  The paths  have been  restricted \nso  that there  will  be  no  collisions,  and  paths'  directions  use  consecutive  time \nslots.  Hence  if all  arcs  at time  i send a  message  to their neighbors,  then each \nprocessor  is  guaranteed  no  more  than  1  message  coming  to it.  The end  of a \npath  is  specified  by  setting  a  separate  bit  that  is  tested  after  each  message \nis  received.  A  separate  start  bit  indicates when  a  path starts.  The  start  bit \nis  needed  because  the  SLOTS  array just tells  the  processors  where  to send  a \nmessage,  regardless  of  how  that  message  arrived.  The  start  array  indicates \nwhen a  message originates, as opposed to arriving from a  neighbor. \n\nThe following  algorithm is  basic to the routing system. \n\nfor  i  =  time  1  to  T \n\nFORALL  processors \n/*  if  an  arc  starts  or  is  passing  through  at  this  time*/ \n\nif  SLOT[i] . START  =  1  or  active  =  1 \n\nfor  j=1  to  neighbor-limit \n\nif  SLOT[i].direction=  j \n\nwrite  message  bit  to  in-box \n\nof  neighbor  j: \n\nset  active  =  0: \nFORALL  processor  that  just  received  a  message \nif  end[i] \n\nmove  in-box  to  message-destination; \n\nelse \n\nmove  in-box  to  out-box: \nset  active  bit  =  1: \n\nThis  code  follows  the  method  mentioned  above.  The  time  slots  are  looped \nthrough and the messages are passed in the appropriate directions as specified \nin the SLOTS array.  Two bits, in-box and out-box, are used for message passing \nso  that  an out-going  message  won't be  overwritten  by  an  in-coming  message \nbefore  it  gets  transferred.  The  inner  loop  lor j  = 1  to  neighbor  limit  checks \neach of the  possible  neighbor  directions  and sends  the message  to  the correct \nneighbor.  For instance, in a grid the neighbor limit is  4, for North, South, East, \nand  West  neighbors.  The  time  complexity  of data  movement  is  O(T  times \nneighbor-limi t) . \n\nSETTING  UP  CONNECTIONS \n\nOne of the goals in developing this system was to have a  method for adding \nnew connections quickly.  Paths are added so  that they don't conflict with any \npreviously  constructed  path.  Once  a  path  is  placed  it  will  not  be  re-routed \n\n\f811 \n\nby the  basic  placement algorithm;  it will  always start at the same spot at the \nsame  time.  The basic  idea of the  method  for  placing  a  connection  is  to start \nfrom  the source  processor  and  in  parallel  examine  all  possible  paths outward \nfrom it that do not conflict with pre-established paths and which adhere to the \nsequential  time  constraint.  As  the  trial  paths  are  flooding  the  system,  they \nare recorded  in temporary storage.  At the end of this deluge of trial  paths all \npossible  paths will  have been examined.  If the destination processor has been \nreached,  then  a  path  exists  under  the  current  time-space  restrictions.  Using \nthe stored  information a  path can be  backtraced and  recorded  in the  SLOTS \nstructure.  This is  similar to the Lee-Moore routing algorithm21 \u202222  for  finding  a \npath in a  system, but with the sequential time restriction. \n\nFor example,  suppose  that the  connection  (u,v)  is  to  be  added.  First  it is \nassumed  that processors for  u  and v  have already  been  determined, otherwise \n(as  a  simplification)  assume  a  random  allocation  from  a  pool  of free  proces(cid:173)\nsors.  A  parallel breadth-first search will  be performed starting from  the source \nprocessor.  During the propagation phase a  processor which receives a  message \nchecks its SLOTS array to see if they are busy on that time step, if not  it will \npropagate to its neighbors on the next  time step.  For instance, suppose a  trial \npath starts at time 1 and moves to a neighboring processor, but that neighbor is \nalready busy at time 1  (as can be seen by examining the DIRECTION-SLOT.) \nSince a  path that would go  through this  neighbor at this time is  not legal,  the \ntrial path would commit suicide,  that is,  it stops propagating itself.  If the pro(cid:173)\ncessor slot for time 2 was free,  the trial path would attempt to propagate to all \nof its' neighbors  at time 3. \n\nUsing  this  technique  paths can  be  constructed  with  essentially  no  knowl(cid:173)\nedge of the relative locations of the  \"neurons\"  being connected or the underly(cid:173)\ning topology.  Variations on the outlined method, such as choosing the shortest \npath, can improve the choice of paths with very little overhead.  If the entire net(cid:173)\nwork were known ahead of time, an off-line method could  be used  to construct \nthe paths more efficiently;  work on off-line methods is underway.  However, the \nsimple elegance of this basic method holds great appeal for systems that change \nslowly over time in unpredictable ways. \n\nPERFORMANCE \n\nAdding an edge  (assuming one can be added), deleting any set of edges,  or \n\ntraversing all  the edges in parallel, all  have time complexity O(T x  neighbor(cid:173)\nlimit).  If it  is  assumed  that neighbor  limit  is  a  small  constant then  the  com(cid:173)\nplexity  is  O(T).  Since  T  is  related  both  to  the  time  and  space  needed,  it  is \na  crucial  factor  in  determining  the  value  of the  algorithms  presented.  Some \nanalytic bounds on T  were presented inll, but it is difficult to get a  tight bound \non T  for general interconnection networks and dynamically changing graphs.  A \nsimulator was constructed to examine the behavior of the algorithms.  Besides \nthe  simulated  data,  the  algorithms  mentioned  were  actually  implemented  for \nthe  Connection  Machine.  The  data  produced  by  the  simulator  is  consistent \nwith that produced by the real machine.  The major result is  that the size of T \nappears  proportional to the average degree of the graph times  the diameter of \nthe interconnection network20 \u2022 \n\n\f812 \n\nFURTHER RESEARCH \n\nThis  paper has  been  largely  concerned  with  a  system that  can realize  the \nconnections in a  neural network when the two  neurons  to be joined have  been \nactivated.  The  tests  conducted  have  been  concerned  with  the validity  of the \nmethod  for  implementing connections,  rather than with  a  full  simulation of a \nneural  network.  Clearly this is  the next step. \n\nA  natural  extension  of  this  method  is  a  system  which  can  form  its .own \nconnections  based  solely  on  the  activity  of  certain  neurons,  without  having \nto explicitly  activate the  source  and  destination neurons.  This  is  an exciting \navenue, and further results should  be forthcoming. \n\nAnother  area of  research  involves  the  formation  of branching  paths.  The \ncurrent method  takes an arc in  the neural  network and realizes  it  as a  unique \npath  in  space-time.  A  variation  that  has  similarities  to  dendritic  structure \nwould  allow  a  path coming  from  a  neuron  to  branch and  go  to several  target \nneurons.  This extension would  allow  for  a  much more  economical  embedding \nsystem.  Simulations are currently underway. \n\nCONCLUSIONS \n\nA method has been outlined which allows the implementation of neural nets \nconnections on a  class  of parallel  architectures which can be constructed with \nvery large numbers of processing elements.  To economize on hardware so as  to \nmaximize the number of processing element buildable, it was assumed that the \nprocessors  only  have local connections;  no hardware is  provided for  communi(cid:173)\ncation.  Some simple  algorithms  have  been  presented which  allow  neural  nets \nwith arbitrary connections to be embedded in SIMD architectures having a  va(cid:173)\nriety of topologies.  The time for  performing a  parallel traversal and for adding \na  new  connection  appears  to  be  proportional  to the diameter of the  topology \ntimes  the average  number of arcs  in  the  graph  being embedded.  In a  system \nwhere  the topology  has diameter O(logN),  and where  the degree of the  graph \nbeing  embedded  is  bounded  by  a  constant,  the  time  is  apparently  O(logN). \nThis  makes  it  competitive with  existing  methods  for  SIMD  routing,  with  the \nadvantages that there are no apriori requirements for the form of the data, and \nthe  topological  requirements are  extremely  general.  Also,  with  our  approach \nnew arcs can be added without reconfiguring the entire system.  The simplicity \nof the implementation and the flexibility of the method suggest that it could be \nan important tool for  using SIMD  architectures for  neural network simulation. \n\nBIBLIOGRAPHY \n\n1.  M.J.  Flynn,  \"Some  computer organizations  and  their effectiveness\",  IEEE \nTrans Comput., vol  C-21,  no.9,  pp.  948-960. \n2.  W.  Hillis, \"The Connection Machine\", MIT Press,  Cambridge, Mass,  1985. \n3.  D.  Nassimi, S.  Sahni,  \"Parallel Algorithms to Set-up the Benes Permutation \nNetwork\", Proc.  Workshop on Interconnection Networks for  Parallel and Dis(cid:173)\ntributed Processing,  April 1980. \n4.  D. Nassimi, S. Sahni,  \"Benes Network and Parallel Permutation Algorithms\", \nIEEE Transactions on Computers, Vol  C-30,  No 5,  May 1981. \n5.  D.  Nassimi,  S.  Sahni,  \"Parallel Permutation and Sorting Algorithms and  a \n\n\f813 \n\nNew  Generalized  Connection Network\" , JACM,  Vol.  29,  No.3, July 1982  pp. \n642-667 \n6.  K.E.  Batcher,  \"Sorting Networks and  their Applications\", The Proceedings \nof AFIPS 1968  SJCC, 1968,  pp.  307-314. \n7.  C. Thompson,  \"Generalized connection networks for parallel processor inter(cid:173)\ncommunication\", IEEE Tran.  Computers, Vol  C, No 27, Dec 78, pp.  1119-1125. \n8.  Nathan H.  Brown, Jr.,  \"Neural Network Implementation Approaches for the \nConnection Machine\", presented at the 1987 conference on Neural Information \nProcessing Systems - Natural and Synthetic. \n9.  K.E.  Batcher,  \"Design of a  massively  parallel  processor\",  IEEE  Trans  on \nComputers,  Sept 1980, pp.  836-840. \n10.  H.M. Hastings, S. Waner, \"Neural Nets on the MPP\" , Frontiers of Massively \nParallel  Scientific  Computation,  NASA  Conference  Publication  2478,  NASA \nGoddard  Space Flight Center, Greenbelt Maryland,  1986. \n11.  S.  Tomboulian,  \"A  System for  Routing Arbitrary Communication Graphs \non  SIMD  Architectures\",  Doctoral  Dissertation,  Dept  of  Computer  Science, \nDuke University, Durham NC. \n12.  T.  Feng,  \"A  Survey  of  Interconnection  Networks\",  Computer,  Dec  1981, \npp.12-27. \n13.  F.  Preparata and J.  Vuillemin,  \"The Cube  Connected  Cycles:  a  Versatile \nNetwork for  Parallel Computation\",  Comm.  ACM, Vol  24, No 5 May 1981, pp. \n300-309. \n14.  H.  Stone,  \"Parallel processing with the perfect shuffle\", IEEE Trans.  Com(cid:173)\nputers,  Vol  C,  No  20,  Feb 1971,  pp.  153-161. \n15.  T. Leighton,  \"Parallel  Computation Using  Meshes  of Trees\",  Proc.  Inter(cid:173)\nnational  Workshop  on  Graph  Theory  Concepts  in  Computer  Science,  1983. \n16.  W.S.  McCulloch, and W.  Pitts,  \"A Logical  Calculus of the Ideas Imminent \nin Nervous Activity,\"  Bulletin of Mathematical Biophysics, Vol 5,  1943, pp.115-\n133. \n17.  J.J.  Hopfield,  \"Neural  networks and  physical  systems  with  emergent  col(cid:173)\nlective computational abilities\",  Prot!.  Natl.  Aca.  Sci.,  Vol  79,  April  1982, pp. \n2554-2558. \n18.  T. Kohonen,  \"Self-Organization and Associative Memory, Springer-Verlag, \nBerlin, 1984. \n19.  R.P. Lippmann,  \"An Introduction to Computing with Neural Nets\",  IEEE \nAASP, Apri11987,  pp.  4-22. \n20.  S. Tomboulian,  \"A System for Routing Directed Graphs on SIMD Architec(cid:173)\ntures\",  ICASE  Report  No.  87-14,  NASA  Langley Research Center,  Hampton, \nVA. \n21.  C.Y.  Lee,  \"An  algorithm for  path  connections and  its  applications\",  IRE \nTrans  Elec  Comput,  Vol.  EC-I0, Sept.  1961,  pp.  346-365. \n22.  E.  F.  Moore,  \"Shortest  path  through  a  maze\",  A nnals  of  Computation \nLaboratory, vol.  30.  Cambridge, MA:  Harvard Univ.  Press, 1959,  pp.285-292. \n\n\f", "award": [], "sourceid": 35, "authors": [{"given_name": "Sherryl", "family_name": "Tomboulian", "institution": null}]}