{"title": "Spreading Activation over Distributed Microfeatures", "book": "Advances in Neural Information Processing Systems", "page_first": 553, "page_last": 559, "abstract": null, "full_text": "553 \n\nSPREADING ACTIVATION OVER \nDISTRIBUTED MICROFEATURES \n\n* \nJames Hendler \n\nDepart.ment, of Computer Science \n\nUniversity of Maryland \nCollege Park, MD 20742 \n\nABSTRACT \n\nOne att\u00b7empt at explaining human inferencing is that of spread(cid:173)\ning activat,ion, particularly in the st.ructured connectionist para(cid:173)\ndigm. This has resulted in t.he building of systems with semanti(cid:173)\ncally nameable nodes which perform inferencing by examining \nt.he pat,t.erns of activation spread. In this paper we demonst.rate \nt.hat simple structured network infert'ncing can be p(>rformed by \npassing art.iva.t.ion over the weights learned by a distributed alga(cid:173)\nrit,hm. Thus , an account, is provided which explains a well(cid:173)\nbehaved rela t ionship bet.ween structured and distri butt'd conn('c(cid:173)\nt.ionist. a.pproachrs. \n\nINTRODUCTION \n\nA primar~\u00b7 difference brtween t,he nPllral net.works of 20 years ago and t.he \n(\"urrent genera Lion of connect,ionist models is t.he addit.ion of mechanisms whic h \npermit t.he s),st,em to create all internal represent,ation. These subsymbolic, \nsemantica.lly unnameable, feat.urrs which a.re induced by connectionist. learning \nalgorithms havr been discussed as bt,ing of import. bot,h in structured and distri(cid:173)\nbut.ed ronnl\"ctionist nrtworks (cf. Feldman and Balla.rd , 1982; Rumelhart and \nMcClelland, 198(j). The fact that network learning algorit.hms can creatr these \nrm\u00b7cro!eal'ure,s is not, however. enough in itself t.o aC('Qunt for how rognition \nworks. Most. of what, we call int.elligent thought. dt'rives from being able t,o rea(cid:173)\n:son about. t.he relatioll:::> I)t'tween object.s, to hypothesize about event.s a.nd things, \netc. If we are to do cognit.ive modeling we must. complet.e the story by rxplainiJlg \nhow networks can rea,-;oll in tht> wa.y that, humans (or other int,elligrnt beings) do. \n\naile attempt at (-'xplaining such rea.<;oning is that of spreading activat.ion ill the \nstructured cOllllect,ionist. and marker- passing (cf. Charniak, ]983; Hendler, 1987) \n\n\u2022 The aut.hor is also affiliatf' ci wit.h thp Instit.ut.e for Acivanced C'omputpr Studies a.nd t.he Sys-\ntems Research Center a.t the Universit.y of Ma.ryland. Funding for this work was provided in part by \nOffi ce of Naval Resf'arch Grallt N00014-88--K - 0,560. \n\n\f554 \n\nHendler \n\nIn \n\napproaches. \nt,hese syst,em::i semantically nanwable nodes permit an energy \nspread; and reasoning about tilt' world is accounted for by looking at, either \nstahh' configurations of tJw activation (the st.ruet.ured connectionist. approach) or \nat. t,he paths fOllnd by examining int.ersect,ions among t.he nodes (the marker(cid:173)\npassing te('llnique). \nIn this paper we will demonst.rate t.hat. simple Rt,ruct.lJred(cid:173)\nnetwork- like infE'rencing ('an be performed by pa.ssing act.ivat.ion over t,he wt'ightR \nlea.rned by a distribut.ed algorithm. Thus, an account. is provided which explains \na well-behaved relation~hip bet.ween st.ruct,ured and dist.ributed connt'ct.ionist \na pproa.ches . \n\nTHE SPREADING ACTIVATION MODEL \n\nIn this paper we will demonstrate that local connect.ionist.- Iike net.works can be \nbuilt by spreading activat.ion ovel' t.he microfeaturcs learned b.\\' a dist.ribut.ed net(cid:173)\nwork. To show t.his, we st,art wit.h a simple example which c1f'Jl1onst.rat.es the \nactivation spreading mechanism used. ThE' part.icular net.work We will use ill this \nexample is a 6- 3-8 three-layer net.work t rained by t.he hack-propagation learning \nalgorithm. The training set used is shown in table 1. The \\w\u00b7ights bet.ween the \nout.put node::; and hidden units which are learned by the Iletwork (after h'arning \nt.o tlw !)O% level for a typical rlln) are shown in figure 1. \n\nTABLE 1. Training Set. for Examph' 1. \n\nInpll t \n\nPattern \n\nOutput. \nPattern \n\n000000 10000000 \n0000 J 1 01000000 \n001100 00100000 \nOOll}1 00010000 \n110000 00001000 \nlIOO]] 00000100 \n] I I 100 00000010 \n111111 00000001 \n\n\fSpreading Activation over Distributed Microfeatures \n\n555 \n\nWeights \n\nh1 \n\nh2 \n\nh3 \n\n-4.98 \n-6.99 \n-6.11 \n-6.37 \n4.36 \n4.38 \n0.89 \n3.88 \n\n4.40 \n-4.99 \n3.49 \n-4.68 \n3.73 \n-5.97 \n1.07 \n-6.95 \n\n-2.82 \n-2.23 \n0.30 \n2.53 \n-5.09 \n-3.67 \n3.32 \n1.88 \n\nn1 \nn2 \nn3 \nn4 \nn5 \nn6 \nn7 \nnB \n\nFigure 1. Weights L~arnecl by Back Propagation \n\nTo underst.and how t,he act,ivat.ioll spreads, let. us examine what. occurs when \nactivation is started at node nl wit.h a weight of 1. This activation strength is \ndivided by the outbranching of the nodf' and then mult.iplied by t.he weight of \neach link to t.he hidden units. Thus a.ctiva.tion flows from nl t.o hi with a \nstrength of 1/ 3 1\" Weighl(tl1 ,hl}. A similar computat.ion is made t.o each of t.he \nother hidden units. This act.ivat.ion now spreads to each of the ot.her Ollt.pUt. \nnodes in turn. Thus, n2 would gain act,ivat.ion of \n\nActi','ation(hl) I Wet\"ght(n2,hl)/ 8 -1-\n\nActivat-ion(h2) 1\" lYe'ight(n2)d)/ 8 + \nAd1'vation(h3) 1\" Weight(n2,h3)/ 8 \n\nor .80 from rli . \n\nTable 2 shows a graph of t.he act.ivat,ioll spread between the output units. The \ntable, which is symmetric, can t.hus be read as showing t.he out,put at each of the \nother units when an activation st.rength of 1 is placed at t,he named node. Look(cid:173)\ning at. the table we see t,hat. t.il(' hight:'st, activat.ion OCClJrs among nodes which \nshare t.he most feat,ures of tJlf' input (i .e. \n~anlf' value and posit.ion) while t.he \nlowest is se('n among tho::;t:' pat.tt'fIls shariug t.he fewest feat.ures. \n\n\f556 \n\nHendler \n\nHowever, as well as having this property, t.able 2 can be seen as providing a \nmatrix which specifies t.he weights bet.ween t.he out.put nodes if viewed as a st.ruc(cid:173)\nt,ured net.work. That. is. ,,1 is conned\u00b7ed to rtf! by a strength of + .80, to nB by a \nst.rength of + 1.0:3, etc . Thus. by I\\~ing this t.echniquE' distribut.ed r{'presentations \ncan be turned into connectivity weight.s for st.ructured lIet.works. When non-\u00b7 \nort.hogonal wpights are lIsed. t he same act.ivat ion--spreading a.lgorithm produces a \nstruct.ured network whi('h can be used for more complex inferenciug than can the \ndist.ributed net work alone . \n\nWe demonst.rat.e t.his b)' a simple . and aga.in contrin\u00b7d. example. This example is \nmot.ivat.ed by Gary Cot.trell's st.ructured JI10dd for word sense disambiguation \n(CoUrt'll, Hl85). Cottrell, using weights clerived by hand, demonstrat.ed that. a \nstruct.ured connect.ionis!. net.work could di:';t inguish both word- sense a.nd case- slot \nassignm('nts for amhiguous lexical items. Pre~enkd wit.h the sent{'nce \"John \nthrew the fight.\" f\u00b7he syst.em would ndivate a node for one meaning of \"throw/' \npresented wit.h \"John t.hrew t.he ball\" it would come up wit.h another. The nodes \nof Cottrell's network included worels (John. Threw , etc .). word RenseR (Johnl, \nPropp\\. etc .) \n:lnd cClI'P- sloLs (TAGT (lIgt'llt of tl1(-' throw) . PAGT (ag('nt of t,he \nPropel). etc.). \n\nTABLE 2. Adivat.jon Spn>ad in 6--:3\u00b78 Network. \n\n,,1 \n\u2022 \n.80 \n1.03 \n.17 \n.38 \n-1.57 \n- .38 \n- 2.3 \n\nII::! \n.80 \n\u2022 \n1.02 \n2.60 \n-1.57 \n.31 \n-.79 \n-.14 \n\noj \n1.03 \n1.02 \n\u2022 \n.97 \n- .63 \n-2 .03 \n- .03 \n-l.f!7 \n\n114 \n.17 \n2.6 \n.97 \n\u2022 \n-2.42 \n-.38 \n-.09 \n.52 \n\n\u00b7//5 \n.38 \n-1.57 \n- .0:3 \n- 2.42 \n\u2022 \n.64 \n-.;38 \n- .77 \n\nuB \n\n-1.57 \n.:31 \n- 2.03 \n-.:38 \n.04 \n\u2022 \n\n- .f) \n2.1 \u00b71 \n\nTl7 \n- .:38 \n-.79 \n- .0:3 \n-.Of! \n-.;38 \n-.6 \n\u2022 \n.09 \n\nnl \nn2 \nn.'J \nn4 \n'115 \n116 \nn7 \nuB \n\nu8 \n- 2.3 \n.14 \n--1.97 \n.52 \n-.77 \n2 .14 \n.09 \n\u2022 \n\nTo duplicate Gary's network via training, we presented a 3-layer backprop net ... \nwork wit.h a training set in which distribut.ed pat.t.erns, H'ry loosely corresponding \nto a \"dictionary\" of word (-,Ileodings l were associa.ted wit,h a vector representing \neach of the individual noell's which would he represented in Cottrt-ll's system, but. \nwit.h no struct.ure . Thus, each elemPIlt. in t.he training set. is \n\n}- Whieh in any realisti r sy stem would sornp. day be rpplaeed by aduaJ signal pro.;pssing out(cid:173)\n\np\"t.S Of Othpf rpprespnt.alinns of ai'liial worn pfonuneiat.ioll forms. \n\n\fSpreading Activation over Distributed Microfeatures \n\n557 \n\na 16 bit vretor (represent.ing a four word srnt.en('e. ea('h word as a. 4 bit. pattrrn), \nassociated wit.h allot.he)\u00b7 16 bit. ve('tor repr<'spnt.ing t.he nodes \n\nBobl Johnl propE'l t.hrfW IJght I ball J jlJgt pob) t agt. tobj bob John t.hrfw U1E' \nfight ball \n\nFor t.his example, the system was t.rnineci on th(\u00b7 E'1I('odings of t.he four srnt.encrs \n\nJohn t.hrew the ball \nJohn threw the fight. \nBob t.1Hfw t be hall \nBob t.hrfw Lllf 6ght \n\nthe output set. high foJ' \n\nwit.h \na.ppropriatf\u00b7\\y a.<.;so('iaj,rd . (IS shown in Tabh\u00b7 :3. \n\ntho:-;e objects /II the second vedor whi('h \n\n\\v('/,e \n\nTABLE ;~. Tr<lining S(\u00b7t fa)' Example Z. \n\nInput. \nPattern \n\nOutput. \nPatt.el'n \n\n011 0 0001 0101 OOlO \n01 J 0 OOOJ 0101 J010 \n100/ OOOJ 010J 00/0 \nJ 00 J 000 1 0101 10 10 \n\nJOO 1 1000 1J J 01 I 10 \nlOlOOlJI00101101 \n0/0/100011011110 \n01 JOOJ 1100011101 \n\nUpon complet.ion of t.he kaming , t.he aet.ivat.ioJl sprrading algorithm was used to \nderive a table of COlll1f'ctivity weights betw('ell the out.put. unit.s as shown in table \n4. \n\nThese weight.s W(')'e then t.ransferred into a local COllnect ionist simulat.or and a \nvery simple act.ivation spn'adiJlg modrl was llsed to examine t.he r esult\u00b7s . \n\\Vhpn \nWt' run \nleaJ'lwd wpights. \nexactly t.he f('sult.s prodllced by Cott.reJl's n{'twork lIr{' seen . Thus : \n\nt.he simulator. u~ing tht' aeti\\'atioIl spn\u00b7ading oY('r \n\nAct.ivation from tht, nodt's corresponding to john. tbroll', tllf-. alld fif/hl \ncam,e a posit.ive activation at. the node for ('Throw\" and a negative a('(cid:173)\nt.ivat.ion at t.he node for \"Propr!.\" \n\nwhile \n\nA('tivat.ion from joh\" throlt' the ball sprt'ad positiniy to \" Prope'\" ,lIlel \nnot. t.o 'I t IIrow . ,. \n\nFurt.her, ot\u00b7her effrct.s which cHP also predict.ed by C'ou.rdl's model lire seeJl: \n\nAct.iYat.ioJl at. TAGT and TOB.! spreads posit.i\\'t' activation to TIr.,.o/l' \nand not t.o Propel. \n\nActivation at PAG?' and POB) causes a spread t.o Propel but. not to \nThro'W. \n\nand \n\n\f558 \n\nHendler \n\nTABLE 4. Connectivity Weights for Example 2. \n\n*** -\u00b70.12 0.01 -0.01 -0.01 0.01 0.01 0.01 - 0.01 -0.01 0.12 - 0 .12 \u00b7\u00b7\u00b70.0:3 -0.0:3 -0.01 0 .00 \n- 0.12 *** - 0.01 0.01 0.01 - 0.01 -0.01 -0.01 0.01 0.01 -\u00b70.12 0 .12 0.0:3 O.O~ 0.01 \u00b7-0.01 \n0.01 -OJ)} *u -0.04 -0.04 0 .04 0 .04 0.05 -0.05 - 0.04 0.01 - 0.01 - 0 .02 - 0.02 - 0.04 0.04 \n-0 .01 0.01 \u00b7-0.04 *** 0.04 -0 .04 - 0.04 -0.05 0.05 0.04 \u00b7_\u00b70.01 0.01 0.02 0.02 0.04 - 0 .04 \n-0.0] 0.01 -0.04 0.04 u* -0.04 -0.0,5 -0,05 0.05 0,04 -0,01 0 ,01 0.02 0.02 0.04 - 0.04 \n0,01 -0,01 0,04 -0.04 -0,04 *** 0.05 0,05 -0.0.5 -\u00b7 0.04 0.01 - 0.01 - 0 .02 -0.02 - 0.04 0.04 \nO.OJ -0.01 0.04 -0.04 -0,05 O.O!) *** 0.05 -0 .05 -0.05 0.01 \u00b7\u00b70.00 -0.02 -0.02 - 0 .04 0.05 \n0 .01 \u00b7-0.01 0.05 -0.05 -0,05 0.05 O.OS u* -\u00b70.05 -O.O!) 0.01 -O.OJ -0'()2 -0,02 \u00b7\u00b70.04 0.05 \n-0.01 0.01 -0,05 0.05 0.05 -0.05 -0.05 - 0.05 *** 0.05 -0.01 O.()] 0 .02 0.0:3 0 .04 - 0.05 \n-0.01 0.01 -0.04 0.04 0.04 -0.04 -0,05 -0.05 0.05 u* -0.01 0.01 0.02 0.02 0.04 -0.05 \n\u00b7-O.l:.? \u00b7-0.0:3 -0.0:3 \u00b7-0.01 0.0] \n0 .12 -0.12 0 .01 -0.01 - 0.01 0.01 0.01 0 .01 -0.01 -0 .01 *** \n- 0.12 0.12 \u00b7-0,01 0 .01 0.01 -\u00b70.01 -0.00 -0.01 0 .01 O.OJ \u00b7\u00b70.12 *** \nO.O:~ 0.0:3 0 .01 \u00b7\u00b70.00 \n0 .20 0.02 -0.02 \n-0.0:3 0.03 -0.02 0 .02 0.02 -\u00b70.02 -\u00b70.02 -0.02 \n0.02 - 0.02 \n\u00b7\u00b70.03 0 .0:3 \u00b7\u00b70.02 0 ,02 0.02 -0 .02 \u00b7-0.02 -0.02 \n* ** \n0.02 *** \n\u00b7\u00b70.0] 0.01 -\u00b70.04 0.04 0.04 - 0.04 -\u00b70.04 - 0.04 \n. ~.O .\u2022 \n0.00 O.OJ 0.04 -0.04 \u00b7-0.04 0.04 0.0,5 0,05 \n0.02 \u00b70.04 *** \n\n0.02 0.02 \u00b7-0.0:3 0.0:3 *** \n0 .0;3 0.02 0.0:3 0.0:) 0.20 \nO.O<J D.O\u00b7' \u00b7 0.01 O.OJ 0.02 \nO.O!) O.OG O.OJ \u00b70.00 0 .02 \n\nWe believe that results like this one ma.y argue t.hat. :-;tl'uetllrt'd IId\u00b7works are \nt,hat. distribllt.<\u00b7d Ilf'twork le:ullillg \nint.pgrally linked to dist.ributed networks ill \nt.echuiqut's may provide a fundamental ba~i:-; for ('xplaining the ('ogllit.in' df'VelOP(cid:173)\nment. of structured networks . III addition, Wt' see that :,-imple illf{'J'f'llt ial rt'a~on\u00ad\ning ('an be produ('ed using purdy cOIlJledioni~t. mockl:,. \n\nCONCLUDING REMARKS \n\nWt' have at.t.empt.ed t.o ;;;hO\\ .... that a model u;;;ing an activation spn'ading va.rinnt \nean be used t.o take learned connect.ionist models aJld lwrform SOllle limit ed forms \nof inferencing upon t.helll. Furt.her, WP have argllt\"d t.hat t.his tf'chniqlll' Illay pro(cid:173)\nIw leal'llPd and \n\"idr a comput.ational modd in which strllctured network;;; C;}l) \nt.hat st,ruct.ured net.work::; provide the infeI't'ncing (\u00b7apnbilities missing in purely \ndist.rihut.ed models. However. befort' w(' can t.ruly furtht'r thi~ claim. ~ignit1('ant \nwork remain;;; to be done . We must ext.end alld ('xplort' such models. particularly \nt'xamining whether t.hese t\u00b7ypes of t.echniqups ('an be extt'llded t.o handle tlw com(cid:173)\nplt'xitr t\u00b7hat can be found in real\u00b7-world prol)JpIll~ and sl'rioll~ ('ogllitin models . \n\nIn part ieula.r we are beginlling an examinat.ion of two cru('ial isslIP::;: First., \nwill t.he technique described above work for realist.ic problem\u00b7;? In particular, can \nt.hl' infen\u00b7ncing be designed t.o impact on t.1l(' ]'Pwgnition by t 11(> di.stribut.ed net.(cid:173)\nwork? If so, one ('ould Sf'f', for l'xample, a speech recognit.ion program coupled to \na system like Cottrell's natural languagt' sy:.;\\.eJll, providing a handle for a t.ext. \nunderst.anding system. Similarly such a t.e('hnique might. allow the int.('gration of \ntop - down and bott.OIll'-IIP pro('('ssing for vision a.nd ot.ht'r such signal Pl'ocf'ssing \ntasks. \n\n\fSpreading Activation over Distributed Microfeatures \n\n559 \n\nSecondly. we wish to see jf more complex spreading activation models could be \nhooked to this t.ype of model. Could networks such as t.hose proposed by Sha.st.ri \n(1985), Diedt'rich (HI8!\u00bb). and Polla.ck and 'Valt.z (1982) which provide complex \ninferencing bllt requirt' more ,structure than simply wt'ight.s bet.ween unit.s, be \nab.stract.t'cI out. of the learned weight.s? Two partieular areas current.ly being pur(cid:173)\n~llf'd by t.JH' author. for t'xampk foclls on act.i\\'e inhibit.ioll modeb for dt't.ermin(cid:173)\ning whetlwr port.jon~ of the nt'twork C(ln lw ~lIpPJ'f'ssed t.o provide mort' complex \ninferencing and t.he learning of structures given temporally ordered information. \n\nReferences \n\nCharniak, E. Passing markt'rs: A t.heory of contt'xtual influence lJl language \ncomprehension Cognitive S6ence, 7(3), 198:~. 171-190. \n\nCottrell, G.W. A Connection-ist Approach to Word- Sense DiMlmb1'gllation, Doe(cid:173)\nt.oral Disst'rta.t.ion, Comput.t'r Scienc(' Dt'pt.. l fllivl'rsit.y of Hochesler, 1985. \n\nDiederich, J. Parallelverarbeil/l'flg in 1IetzlL'erk-ba.~ie1'fen Systemen PhD Disst'rta(cid:173)\ntion, Dt'pt. of Linguistics, llni\\'ersity of Rielt~f('ld. J985. \n\nFeldman, J.A. and Ballard , D.H. (J 982). Connect.ionist models a.nd t.heir propt:'r(cid:173)\nt.ies. Cognit1've Science. 6. 205-\u00b7254-\n\nHendler, J.A. \nJlItegrating Maf'l.\u00b7er -pali8illg aod Problem Solving: A .~p,.e(/dirtg \nartivatiou approach to 1'ml)ro'l'cd choice ill ploflJlillg Lawrence Erlbaulll As~ociate~, \nN .J., Novt'mbt'J', Hlg7. \n\nPollack, J.B. and 'Yalt.7., D.L Nat\u00b7u!'al Language Processing lIsiug spr\u00a3'ading \nact.ivat.ion and lateral inhibit.ion Proceedings of the FO'llrth intema.i1'o'l1al Confer(cid:173)\nence of Ihe Cognitive S'o'eure S'ociety, 1982, .50-58. \n\nRurnelhart .. D.E. and McClelland. J.L. ((,ds.) Parallel D';strib'llfed Computing \nCambridge . ,Ma.: MJT Prel-is. \n\nShastri, L. El'ideul1'al RWMuing \u00b7in SefJ/(llItic /\\.retll'{)rk.~: A formal theor:1J ((tid //8 \nparallel implementatioTl Doct.oral DiS8t'l't at.jon, Computt'r Scit'nce Depart ment, \nl Jnivefsit.y of Roc-hester, Sept .. J 985. \n\n\f", "award": [], "sourceid": 98, "authors": [{"given_name": "James", "family_name": "Hendler", "institution": null}]}