{"title": "Neural Network Implementation Approaches for the Connection Machine", "book": "Neural Information Processing Systems", "page_first": 127, "page_last": 136, "abstract": null, "full_text": "127 \n\nNeural Network Implementation Approaches \n\nfor the \n\nConnection Machine \n\nNathan H. Brown, Jr. \n\nMRJlPerkin Elmer, 10467 White Granite Dr. (Suite 304), Oakton, Va. 22124 \n\nABSlRACf \n\nThe SIMD parallelism of the Connection Machine (eM) allows the construction of \n\nneural network simulations by the use of simple data and control structures.  Two \napproaches are described which allow parallel computation of a model's nonlinear \nfunctions, parallel modification of a model's weights, and parallel propagation of a \nmodel's activation and error.  Each approach also allows a model's interconnect \nstructure to be physically dynamic.  A Hopfield model is implemented with each \napproach at six sizes over the same number of CM processors to provide a performance \ncomparison. \n\nINTRODUCflON \n\nSimulations of neural network models on digital computers perform various \ncomputations by applying linear or nonlinear functions, defined in a program, to \nweighted sums of integer or real numbers retrieved and stored by array reference.  The \nnumerical values are model dependent parameters like time averaged spiking frequency \n(activation), synaptic efficacy (weight), the error in error back propagation models, and \ncomputational temperature in thermodynamic models.  The interconnect structure of a \nparticular model is implied by indexing relationships between arrays defined in a \nprogram.  On the Connection Machine (CM), these relationships are expressed in \nhardware processors interconnected by a  16-dimensional hypercube communication \nnetwork.  Mappings are constructed to defme higher dimensional interconnectivity \nbetween processors on top of the fundamental geometry of the communication \nnetwork.  Parallel transfers are defined over these mappings.  These mappings may be \ndynamic. CM parallel operations transform array indexing from a temporal succession \nof references to memory to a single temporal reference to spatially distributed \nprocessors. \n\nTwo alternative approaches to implementing neural network simulations on the CM \n\nare described.  Both approaches use \"data parallelism\" 1 provided by the *Lisp virtual \nmachine.  Data and control structures associated with each approach and performance \ndata for a Hopfield model implemented with each approach are presented. \n\nDATA STRUCTURES \n\nThe functional components of a neural network model implemented in *Lisp are \n\nstored in a uniform parallel variable (pvar) data structure on the CM.  The data structure \nmay be viewed as columns of pvars.  Columns are given to all CM virtual processors. \nEach CM physical processor may support 16 virtual processors.  In the fust approach \ndescribed, CM processors are used to represent the edge set of a models graph \nstructure.  In the second approach described, each processor can represent a unit, an \noutgoing link, or an incoming link in a model's structure.  Movement of activation (or \nerror) through a model's interconnect structure is simulated by moving numeric values \n\n\u00a9 American Institute of Physics 1988 \n\n\f128 \n\nover the eM's hypercube.  Many such movements can result from the execution of a \nsingle CM macroinstruction.  The CM transparently handles message buffering and \ncollision resolution.  However, some care is required on the part of the user to insure \nthat message traffic is distributed over enough processors so that messages don't stack \nup at certain processors, forcing the CM to sequentially handle large numbers of \nbuffered messages. Each approach requires serial transfers of model parameters and \nstates over the communication channel between the host and the CM at certain times in a \nsimulation. \n\nThe first approach, \"the edge list approach,\" distributes the edge list of a network \ngraph to the eM, one edge per CM processor.  Interconnect weights for each edge are \nstored in the memory of the processors.  An array on the host machine stores the \ncurrent activation for all units.  This approach may be considered to represent abstract \nsynapses on the eM.  The interconnect structure of a model is described by product \nsets on an ordered pair of identification (id) numbers, rid and sid.  The rid is the id of \nunits receiving activation and sid the id of units sending activation.  Each id is a unique \ninteger.  In a hierarchical network, the ids of input units are never in the set of rids and \nthe ids of output units are never in the set of sids.  Various set relations (e.g. inverse, \nreflexive, symmetric, etc.) defined over id ranges can be used as a high level \nrepresentation of a network's interconnect structure.  These relations can be translated \ninto pvar columns.  The limits to the interconnect complexity of a simulated model are \nthe virtual processor memory limits of the CM configuration used and the stack space \n~uired by functions used to compute the weighted sums of activation.  Fig.  1 shows a \nR  -> R2 -> R4 interconnect structure and its edge list representation on the CM. \n\n6 \n\n7 \n\n8 \n\n9 \n\n:z \n\n3 \n\neM PROCESSOR  0  1  2  3  4  5  6  7  8  9  1 0111213 \n\n:~ (~\u00b7,';)H f H H H f ff if \n\nSAcr \n\n( 8 j  )  1  2  3  1  2  3  4  5  4  5  4  5  4  5 \n\nFig.  1.  Edge List Representation of a R3_> R2 -> R4 Interconnect Structure \n\nThis representation can use as few as six pvars for a model with Hebbian \n\nadaptation:  rid (i), sid (j), interconnect weight (wij), ract (ai), sact (aj), and learn rate \n\n(11)\u00b7  Error back propagation requires the addition of:  error (ei),  old interconnect \n\nweight (wij(t-l\u00bb, and the momentum term (ex).  The receiver and sender unit \nidentification pvars are described above.  The interconnect weight pvar stores the \nweight for the interconnect.  The activation pvar, sact, stores the current activation, aj' \ntransfered to the unit specified by rid from the unit specified by sid.  The activation \npvar, ract, stores the current weighted activation ajwij- The error pvar stores the error \nfor the unit specified by the sid.  A variety of proclaims (e.g. integer, floating point, \nboolean, and field) exist in *Lisp to define the type and size ofpvars.  Proclaims \nconserve memory and speed up execution.  Using a small number of pvars limits the \n\n\f129 \n\namount of memory used in each CM processor so that maximum virtualization of the \nhardware processors can be realized. Any neural model can be specified in this fashion. \nSigma-pi models require multiple input activation pvars be specified.  Some edges may \nhave a different number of input activation pvars than others.  To maintain the uniform \ndata structure of this approach a tag pvar has to be used to determine which input \nactivation pvars are in use on a particular edge. \n\nThe edge list approach allows the structure of a simulated model to \"physically\" \n\nchange because edges may be added (up to the virtual processor limit), or deleted at any \ntime without affecting the operation of the control structure.  Edges may also be placed \nin any processor because the subselection (on rid or sid) operation performed before a \nparticular update operation insures that all processors (edges) with the desired units are \nselected for the update. \n\nThe second simulation approach, \"the composite approach,\" uses a more \ncomplicated data structure where units, incoming links, and outgoing links are \nrepresented.  Update routines for this approach use parallel segmented scans to form \nthe weighted sum of input activation.  Parallel segmented scans allow a MIMD like \ncomputation of the weighted sums for  many units at once.  Pvar columns have unique \nvalues for unit, incoming link, and outgoing link representations.  The data structures \nfor input units, hidden units, and output units are composed of sets of the three pvar \ncolumn types.  Fig. 2 shows the representation for the same model as in Fig. 1 \nimplemented with the composite approach. \n\n2 \n\n3 \n\n4 \n\n5 \n\n6 \n\n7 \n\n8 \n\n9 \n\no 1  2  3  4  5  6  7  8  9  101112  1314151617181920212223242526272829303132333435 \nc \n~~+-t+*~ \nt  -~. - ~ \n\no rr, f~ ~\\'~~Ii~  +----+ \n\nIO~ O.~~ \nloll \n~ \n\nc - -\u2022. \n\nc - -\u2022. \n\nII  I( \n\nII  I( \n\nI \n\nFig. 2.  Composite Representation of a R3 -> R2 -> R4 Interconnect Structure \n\nIn Fig. 2, CM processors acting as units, outgoing links, and incoming links are \nrepresented respectively by circles, triangles, and  squares.  CM cube address pointers \nused to direct the parallel transfer of activation are shown by arrows below the \nstructure.  These pointers defme the model interconnect mapping.  Multiple sets of \nthese pointers may be stored in seperate pvars.  Segmented scans are represented by \noperation-arrow icons above the structure.  A basic composite approach pvar set for a \nmodel with Hebbian adaptation is:  forward B, forward A, forward transfer address, \ninterconnect weight (Wij), act-l (ai), act-2 (aj), threshold, learn rate (Tl), current unit id \n(i), attached unit id U),  level, and column type.  Back progagation of error requires the \naddition of:  backward B, backward A, backward transfer address, error (ei), previous \n\ninterconnect weight (Wij(t-l\u00bb, and the momentum tenn (ex).  The forward and \nbackward boolean pvars control the segmented scanning operations over unit \nconstructs.  Pvar A of each type controls the plus scanning and pvar B of each type \ncontrols the copy scanning.  The forward transfer pvar stores cube addresses for \n\n\f130 \n\nforward (ascending cube address) parallel transfer of activation.  The backward transfer \npvar stores cube addresses for backward (descending cube address) parallel transfer of \nerror. The interconnect weight, activation, and error pvars have the same functions as \nin the edge list approach.  The current unit id stores the current unit's id number.  The \nattached unit id stores the id number of an attached unit.  This is the edge list of the \nnetwork's graph.  The contents of these pvars only have meaning in link pvar columns. \nThe level pvar stores the level of a unit in a hierarchical network.  The type pvar stores \na unique arbitrary tag for the pvar column type.  These last three pvars are used to \nsubselect processor ranges to reduce the number of processors involved in an \noperation. \n\nAgain, edges and units can be added or deleted.  Processor memories for deleted \n\nunits are zeroed out.  A new structure can be placed in any unused processors. The \nlevel, column type, current unit id, and attached unit id values must be consistent with \nthe desired model interconnectivity. \n\nThe number of CM virtual processors required to represent a given model on the \n\nCM differs for each approach.  Given N units and N(N-1) non-zero interconnects (e.g. \na symmetric model), the edge list approach simply distributes N(N-1) edges to N(N-1) \nCM virtual processors.  The composite approach requires two virtual processors for \neach interconnect and one virtual processor for each unit or N +2 N (N -1) CM virtual \nprocessors total.  The difference between the number of processors required by the two \napproaches is N2.  Table I shows the processor and CM virtualization requirements for \neach approach over a range of model sizes. \n\nTABLE I Model Sizes and CM Processors Required \n\nRun  No.  Grid Size  Number of Units  Edge List  Quart CM Virt. Procs.  Virt. LeveL \n\nN(N-1) \n\n1 \n2 \n3 \n4 \n5 \n6 \n\n82 \n92 \n112 \n132 \n162 \n192 \n\n64 \n81 \n121 \n169 \n256 \n361 \n\n4032 \n6480 \n14520 \n28392 \n65280 \n129960 \n\n8192 \n8192 \n16384 \n32768 \n65536 \n131072 \n\n0 \n0 \n0 \n2 \n4 \n8 \n\nRun  No. Grid Size  Number of Units  Composite  Quart CM Virt. Procs. Virt. LeveL \n\nN+2N(N-1) \n\n7 \n8 \n9 \n10 \n11 \n12 \n\n82 \n92 \n112 \n132 \n162 \n192 \n\n64 \n81 \n121 \n169 \n256 \n361 \n\n8128 \n13041 \n29161 \n56953 \n130816 \n260281 \n\n8192 \n16384 \n32768 \n65536 \n131072 \n262144 \n\n0 \n0 \n2 \n4 \n8 \n16 \n\n\f131 \n\nCONTROL STRUCTURES \n\nThe control code for neural network simulations  (in *Lisp or C*) is stored and \n\nexecuted sequentially on a host computer (e.g. Symbolics 36xx and V AX 86xx) \nconnected to the CM by a high speed communication line.  Neural network simulations \nexecuted in *Lisp use a small subset of the total instruction set:  processor selection \nreset (*all), processor selection (*when), parallel content assignment (*set), global \nsummation (*sum), parallel multiplication (*!!  ), parallel summation (+! I), parallel \nexponentiation (exp! I), the parallel global memory references (*pset) and (pref! I),  and \nthe parallel segmented scans (copy!! and +!!).  Selecting CM processors puts them in a \n\"list of active processors\" (loap) where their contents may be arithmetically manipulated \nin parallel. Copies of the list of active processors may be made and used at any time. A \nsubset of the processors in the loap may be \"subselected\" at any time, reducing the loap \ncontents.  The processor selection reset clears the current selected set by setting all \nprocessors as selected.  Parallel content assignment allows pvars in the currently \nselected processor set to be assinged allowed values in one step.  Global summation \nexecutes a tree reduction sum across the CM processors by grid or cube address for \nparticular pvars.  Parallel multiplications and additions multiply and add pvars for all \nselected CM processors in one step. The parallel exponential applies the function, eX, to \nthe contents of a specified pvar, x, over all selected processors.  Parallel segmented \nscans apply two functions, copy!! and +!!, to subsets ofCM processors by  scanning \nacross grid or cube addresses.  Scanning may be forward or backward (Le.  by \nascending or descending cube address order, respectively). \n\nFigs. 3 and 4 show the edge list approach kernels required for Hebbian learning for \n\na R2 -> R2 model.  The loop construct in Fig. 3 drives the activation update \n\n(1) \n\noperation.  The usual loop to compute each weighted sum for a particular unit has been \nreplaced by four parallel operations:  a selection reset (*all), a subselection of all the \nprocessors for which the particular unit is a receiver of activation (*when (=!! rid (!! \n(1+ u\u00bb\u00bb, a parallel multiplication (*!! weight sact),  and a tree reduction sum (*sum \n... ).  Activation is spread for a particular unit, to all others it is connected to, by: \nstoring the newly computed activation in an array on the host, then subselecting the \nprocessors where the particular unit is a sender of activation (*when (=!! sid (!!  (1 + \nu\u00bb\u00bb, and broadcasting the array value on the host to those processors. \n\n(dotimes (u 4) \n\n(*all (*when (=!! rid (!!  (1+ u\u00bb) \n\n(setf (aref activation u) \n\n(some-nonlinearity (*sum (*!! weight sact\u00bb\u00bb \n\n(*set ract (!!  (aref activation u\u00bb) \n\n(*all (*when (=!!  sid (!!  (1+ u\u00bb) \n\n(*set sact (!!  (aref activation u\u00bb\u00bb\u00bb \n\nFig. 3. Activation Update Kernel for the Edge Lst Approach. \n\nFig. 4 shows the Hebbian weight update kernel \n\n\f132 \n\n(2) \n\n(*all \n\n(*set weight \n\n(*!! learn-rate ract sact\u00bb\u00bb \n\nFig.  4. Hebbian Weight Modification Kernel for the Edge List Approach \n\nThe edge list activation update kernel is essentially serial because the steps involved can \nonly be applied to one unit at a time. The weight modification is parallel.  For error \nback propagation a seperate loop for computing the errors for the units on each layer of \na model is required.  Activation update and error back propagation also require transfers \nto and from arrays on the host on every iteration step incurring a concomitant overhead. \n\nOther common computations used for neural networks can be computed in parallel \nusing the edge list approach.  Fig. 5 shows the code kernel for parallel computation of \nLyapunov engergy equations \n\n(3) \n\nwhere i= 1 to number of units (N). \n\n(+ (* -.5 (*sum (*!! weight ract sact\u00bb) (*sum (*!! input sact\u00bb) \n\nFig. 5. Kernel for Computation of the Lyapunov Energy Equation \n\nAlthough an input pvar, input, is defined for all edges, it is only non-zero for those \nedges associated with input units.  Fig. 6 shows the pvar structure for parallel \ncomputation of a Hopfield weight prescription, with segmented scanning to produce the \nweights in one step, \n\nW\u00b7 \u00b7 -l:S \n\nIJ  -\n\nr= \n\nI(2ar\u00b7-I)(2ar\u00b7-I) \n\n1 \n\nJ \n\n(4) \n\nwhere wii=O, Wij=Wjh and r=I to the number of patterns, S, to be stored. \n\nseg \nract \nsact \nweight \n\nn \n\nn \n\nt \nvII V21 ... VSI  vII V2 I ... VSI  .. . \nV I2 v22' .. VS2  v13 v23 ... VS3  .. . \n\nn \n\nn \n\nt \n\nwI2 \n\nw13 \n\nFig. 6.  Pvar Structure for Parallel Computation QfHopfield Weight Prescription \n\nFig. 7 shows the *Lisp kernel used on the pvar structure in Fig. 6. \n\n(set weight \n\n(scan '+!!  (*!! (-!!  (*!! ract (!! 2\u00bb  (!!  1\u00bb  (-!!  (*!! sact (!! 2\u00bb (!! 1\u00bb\u00bb \n\n:segment-pvar seg :inc1ude-self t) \n\nFig. 7.  Parallel Computation of Hopfield Weight Prescription \n\n\fThe inefficiencies of the edge list activation update are solved by the updating \n\nmethod used in the composite approach.  Fig. 8 shows the *Lisp kernel for activation \nupdate using the composite approach. Fig. 9 shows the *Lisp kernel for the Hebbian \nlearning operation in the composite approach. \n\n133 \n\n(*a1l \n\n(*when (=!!  level (!! 1\u00bb \n\n(*set act (scan!! act-I  'copy!! :segment-pvar forwardb  :include-self t\u00bb \n(*set act (*!! act-l weight\u00bb \n(*when (=!! type (!!  2\u00bb  (*pset :overwrite act-l act-I ftransfer\u00bb) \n\n(*when (=!!  level (!!  2\u00bb \n\n(*set act (scan!! act-l '+!!  :segment-pvar forwarda :include-self t\u00bb \n(*when (=!!  type (!!  1\u00bb (some-nonlinearity!! act-I\u00bb\u00bb \n\nFig. 8. Activation Update Kernel for the Composite Approach \n\n(*all \n\n(*set act-l (scan!!  act-I 'copy!! :segment-pvar forwardb \n\n:include-self t\u00bb \n\n(*when (=!! type (!!  2\u00bb \n\n(*set act-2 (pref!! act-I btransfer\u00bb) \n(*set weight \n\n(+!!  weight \n\n(*!! learn-rate act-l act-2\u00bb\u00bb) \n\nFig. 9. Hebbian Weight Update Kernel for the Composite Approach \n\nIt is immediately obvious that no looping is  invloved.  Any number of interconnects \nmay be updated by the proper subselection.  However, the more subselection is used \nthe less efficient the computation becomes because less processors are invloved. \n\nCOMPLEXITY ANALYSIS \n\nThe performance results presented in the next section can be largely anticipated \n\nfrom an analysis of the space and time requirements of the CM implementation \napproaches.  For simplicity I use a Rn -> Rn model with Hebbian adaptation.  The \noder of magnitude requirements for activation and weight updating are compared for \nboth CM implementation approaches and a basic serial matrix arithmetic approach. \n\nF~r the given model  the space requirements on a conventional serial machine are \n2n+n  locations or O(n2).  The growth of the space requirement is dominated by the \nnxn weight matrix. defining the system interconnect structure.  The edge list appro~ch \nuses six pvars for each processor and uses nxn processors for the mapping, or 6n \nlocations or O(n2).  The composite approach uses  11  pvars. There are 2n processors \nfor units and 2n2 proces~ors for interconnects in the given model.  The composite \napproach uses  11(2n+2n  ) locations or O(n2).  The CM implementations take up \nroughly the same space as the serial implementation, but the space for the serial \nimplementation is composed of passive memory whereas the space for the CM \nimplementations is composed of interconnected processors with memory . \n\nThe time analysis for the approaches compares the time order of magnitudes to \ncompute the activation update (1) and the Hebbian weight update (2).  On a serial \n\n\f134 \n\nmachine, the n weighted sums computed for the ac~vation update require n2 \nmultiplicationsffd n(n-l) additions.  There are 2n  -n operations or time order of \nmagnitude O(n  ~ The time order of magnitude for the weight matrix update is O(n2) \nsince there are n  weight matrix elements. \n\nThe edge list approach forms n weighted sums by performing a parallel product of \n\nall of the weights and activations in the model, (*!! weight sact), and then a tree \nreduction sum, (*sum ... ), of the products for the n uni~ (see Fig. 4).  There are \n1 +n(nlog2n) operations or time order of magnitude O(n  ). This is  the same order of \nmagnitude as obtained on a serial machine.  Further, the performance of the activation \nupdate is a function of the number of interconnects to be processed. \n\nThe composite approach forms n weighted sums in nine steps (see Fig. 8):  five \n.selection operations;  the segmented copy scan before the parallel multiplication; the \nparallel multiplication; the parallel transfer of the products; and the segmented plus \nscan, which forms the n sums in one step.  This gives the composite activation update a \ntime order of magnitude O( 1).  Performance is independent of the number of \ninterconnects processed.  The next section shows that this is not quite true. \n\nThe n2 weights in the model can be updated in three parallel steps using the edge \n\nlist approach (see Fig. 4).  The n2 weights in the model  can be updated in eight parallel \nsteps using the composite approach (see Fig. 9).  In either case, the weight update \noperation has a time order of magnitude 0(1). \n\nThe time complexity results obtained for the composite approach apply to \ncomputation of the Lyaponov energy equation (3) and the Hopfield weighting \nprescription (4), given that pvar structures which can be scanned (see Figs. 1 and 6) are \nused.  The same operations performed serially are time order of magnitude 0(n2). \n\nThe above operations all incur a one time overhead cost for generating the addresses \n\nin the pointer pvars, used for parallel transfers, and arranging the values in segments \nfor scanning.  What the above analysis shows is that time complexity is traded for \nspace complexity.  The goal of CM programming is to use as many processors as \npossible at every step. \n\nPERFORMANCE COMPARISON \n\nSimulations of a Hopfield spin-glass model2 were run for six different model sizes \n\nover the same number (16,384) of physical CM processors to provide a performance \ncomparison between implementation approaches.  The Hopfield network was chosen \nfor the performance comparison because of its simple and well known convergence \ndynamics and because it uses a small set of pvars which allows a wide range of \nnetwork sizes (degrees of virtualization) to be run.  Twelve treaments are run.  Six with \nthe edge list approach and six with the composite approach.  Table 3-1  shows the \nmodel sizes run for each treatment.  Each treatment was run at the virtualization level \njust necessary to accomodate the number of processors required for each simulation. \n\nTwo exemplar patterns are stored.  Five test patterns are matched against the two \nexemplars. Two test patterns have their centers removed, two have a row and column \nremoved, and one is a random pattern.  Each exemplar was hand picked and tested to \ninsure that it did not produce cross-talk.  The number of rows and columns in the \nexemplars and patterns increase as the size of the networks for the treatments increases. \n\n\f135 \n\nSince the performance of the CM is at issue, rather than the performance of the network \nmodel used, a simple model and a simple pattern set were chosen to minimize \nconsideration of the influence of model dynamics on performance. \n\nPerformance is presented by plotting execution speed versus model size. Size is \nmeasured by the number of interconnects in a model.  The execution speed metric is \ninterconnects updated per second, N*(N-l )/t, where N is the number of units in a \nmodel and t is the time used to update the activations for all of the units in a model.  All \nof the units were updated three times for each pattern.  Convergence was determined \nby the output activation remaining stable over the fmal two updates.  The value of t for \na treatment is the average of 15 samples of t.  Fig.  10 shows the activation update cycle \ntime for both approaches.  Fig.  11  shows the interconnect update speed plots for both \napproaches.  The edge list approach is plotted in black.  The composite approach is \nplotted in white.  The performance shown excludes overhead for interpretation of the \n*Lisp instructions.  The model size categories for each plot correspond to the model \nsizes and levels of eM virtualization shown in Table I. \nActivation  Update  Cycle  Time  vs  Model  Size \n\n1 .6 \n1 .4 \n1.2 \n\nsees  O.B \n0.6 \n0.4 \n0.2 \n\n\u2022 \n\n\u2022 \n\no \n\nOO ___ ~~~ __ ~O~ __ ~O~ __ .O __ ~ \n\n\u2022 \n\n\u2022 \n\n1 \n\n2 \n\n3 \n4 \nModel  Size \n\n5 \n\n6 \n\nFig.  10. Activation Update Cycle Times \n\nInterconnect Update  Speed  Comparison \n\nEdge  Ust Approach  vs.  Composite  Approach \n\ni.p.s. \n\n2000000} \n\n1500000 \n\n1000000 \n0 \n500000t \n\no\u00b7 \n\n1 \n\n0 \n\n0 \n\n0 \n\n\u2022 \n\n2 \n\n\u2022 \n\n\u2022 \n\n4 \n3 \nModel  Size \n\n0 \n\n\u2022 \n\n5 \n\n0 \n\n\u2022 \n\n6 \n\nFig.  11.  Edge List Interconnect Update Speeds \n\nFig.  11  shows an order of magnitude performance difference between the \n\napproaches and a roll off in performance for each approach as a function of the number \nof virtual processors supported by each physical processor.  The performance tum \naround is at 4x virtualization for the edge list approach and 2x virtualization for the \ncomposite approach. \n\n\f136 \n\nCONCLUSIONS \n\nRepresenting the interconnect structure of neural network models with mappings \n\ndefined over the set of fine grain processors provided by the CM architecture provides \ngood performance for a modest programming effort utilizing only a small subset of the \ninstructions provided by *Lisp.  Further, the perfonnance will continue to scale up \nlinearly as long as not more than 2x virtualization is required.  While the complexity \nanalysis of the composite activation update suggests that its performance should be \nindependent of the number of interconnects to be processed, the perfonnance results \nshow that the performance is indirectly dependent on the number of interconnects to be \nprocessed because the level of virtualization required (after the physical processors are \nexhausted) is dependent on the number of interconnects to be processed and \nvirtualization decreases performance linearly. The complexity analysis of the edge list \nactivation update shows that its perfonnance should be roughly the same as serial \nimplementations on comparable machines.  The results suggest that the composite \napproach is to be prefered over the edge list approach but not be used at a virtualization \nlevel higher than 2x. \n\nThe mechanism of the composite activation update suggest that hierarchical \nnetworks simulated in this fashion will compare in perfonnance to single layer \nnetworks because the parallel transfers provide a type of pipeline for activation for \nsynchronously updated hierarchical networks while providing simultaneous activation \ntransfers for asynchronously updated single layer networks. Researchers at Thinking \nMachines Corporation and the M.I.T. AI Laboratory in Cambridge Mass. use a similar \napproach for an implementation of NETtalk. Their approach overlaps the weights of \nconnected units and simultaneously pipelines activation forward and error backward.3 \n\nPerfonnance better than that presented can be gained by translation of the control \ncode from interpreted *Lisp to PARIS and use of the CM2.  In addition to not being \ninterpreted, PARIS allows explicit control over important registers that aren't \naccessable through *Lisp.  The CM2 will offer a number of new features which will \nenhance perfonnance of neural network simulations:  a *Lisp compiler, larger \nprocessor memory (64K), and floating point processors.  The complier and floating \npoint processors will increase execution speeds while the larger processor memories \nwill provide a larger number of virtual processors at the performance tum around points \nallowing higher perfonnance through higher CM utilization. \n\nREFERENCES \n\n1. \"Introduction to Data Level Parallelism,\"  Thinking Machines Technical Report \n86.14, (April 1986). \n\n2.  Hopfield, J. J., \"Neural networks and physical systems with emergent collective \ncomputational abilities,\" Proc. Natl. Acad. Sci., Vol. 79, (April  1982), pp. 2554-2558. \n\n3.  Blelloch, G. and Rosenberg, C.  Network Learning on the Connection Machine, \nM.I.T. Technical Report, 1987. \n\n\f", "award": [], "sourceid": 89, "authors": [{"given_name": "Nathan", "family_name": "Brown", "institution": null}]}