{"title": "Distributed Recursive Structure Processing", "book": "Advances in Neural Information Processing Systems", "page_first": 591, "page_last": 597, "abstract": null, "full_text": "Distributed Recursive Structure Processing \n\nGeraldine Legendre \nDepartment of \nLinguistics \n\nYoshiro Miyata \n\nOptoelectronic \n\nComputing Systems Center \n\nUniversity of Colorado \nBoulder, CO 80309-0430\u00b7 \n\nPaul Smolensky \nDepartment of \nComputer Science \n\nAbstract \n\nHarmonic grammar (Legendre, et al., 1990) is a connectionist theory of lin(cid:173)\nguistic well-formed ness based on the assumption that the well-formedness \nof a sentence can be measured by the harmony (negative energy) of the \ncorresponding connectionist state. Assuming a lower-level connectionist \nnetwork that obeys a few general connectionist principles but is otherwise \nunspecified, we construct a higher-level network with an equivalent har(cid:173)\nmony function that captures the most linguistically relevant global aspects \nof the lower level network. In this paper, we extend the tensor product \nrepresentation (Smolensky 1990) to fully recursive representations of re(cid:173)\ncursively structured objects like sentences in the lower-level network. We \nshow theoretically and with an example the power of the new technique \nfor parallel distributed structure processing. \n\n1 \n\nIntroduction \n\nA new technique is presented for representing recursive structures in connectionist \nnetworks. It has been developed in the context of the framework of Harmonic \nGrammar (Legendre et a1. 1990a, 1990b), a formalism for theories of linguistic \nwell-formedness which involves two basic levels: At the lower level, elements of the \nproblem domain are represented as distributed patterns of activity in a networkj At \nthe higher level, the elements in the domain are represented locally and connection \nweights are interpreted as soft rules involving these elements. There are two aspects \nthat are central to the framework. \n\n-The authors are listed in alphabetical order. \n\n591 \n\n\f592 \n\nLegendre, Miyata, and Smolensky \n\nFirst, the connectionist well-formedness measure harmony (or negative \"energy\"), \nwhich we use to model linguistic well-formed ness , has the properties that it is p(cid:173)\nreserved between the lower and the higher levels and that it is maximized in the \nnetwork processing. Our previous work developed techniques for deriving harmonies \nat the higher level from linguistic data, which allowed us to make contact with ex(cid:173)\nisting higher-level analyses of a given linguistic phenomenon. \nThis paper concentrates on the second aspect of the framework: how particular \nlinguistic structures such as sentences can be efficiently represented and processed \nat the lower level. The next section describes a new method for representing tree \nstructures in a network which is an extension of the tensor product representation \nproposed in (Smolensky 1990) that allows recursive tree structures to be represented \nand various tree operations to be performed in parallel. \n\n2 Recursive tensor product representations \n\nA tensor product representation of a set of structures S assigns to each 8 E S a \nvector built up by superposing role-sensitive representations of its constituents. A \nrole decomposition of S specifies the constituent structure of s by assigning to it \nan unordered set of filler-role bindings. For example, if S is the set of strings from \nthe alphabet {a, b, chand 8 = cba, then we might choose a role decomposition in \nwhich the roles are absolute positions in the string (rl = first, r2 = second, ... ) \nand the constituents are the filler/role bindings {b/r2, a/rs, c/rl}. 1 \nIn a tensor product representation a constituent - i.e., a filler/role binding - is \nrepresented by the tensor (or generalized outer) product of vectors representing the \nfiller and role in isolation: fir is represented by the vector v = f\u00aer, which is in fact \na second-rank tensor whose elements are conveniently labelled by two subscripts and \ndefined simply by vt.pp = ft.prp. \nWhere do the filler and role vectors f and r come from? In the most straightforward \ncase, each filler is a member of a simple set F (e.g. an alphabet) and each role is \na member of a simple set R and the designer of the representation simply specifies \nvectors representing all the elements of F and R. In more complex cases, one or \nboth of the sets F and R might be sets of structures which in turn can be viewed as \nhaving constituents, and which in turn can be represented using a tensor product \nrepresentation. This recursive construction of the tensor product representations \nleads to tensor products of three or more vectors, creating tensors of rank three and \nhigher, with elements conveniently labelled by three or more subscripts. \n\nThe recursive structure of trees leads naturally to such a recursive construction of \na tensor product representation. (The following analysis builds on Section 3.7.2 of \n(Smolensky 1990.\u00bb We consider binary trees (in which every node has at most two \nchildren) since the techniques developed below generalize immediately to trees with \nhigher branching factor, and since the power of binary trees is well attested, e.g., \nby the success of Lisp, whose basic datastructure is the binary tree. Adopting the \nconventions and notations of Lisp, we assume for simplicity that the terminal nodes \n\nlThe other major kind of role decomposition considered in (Smolensky 1990) is contex(cid:173)\n\ntual roles; under one such decomposition, one constituent of cba is \"b in the role 'preceded \nby c and followed by a'''. \n\n\fDistributed Recursive Structure Processing \n\n593 \n\nof the tree (those with no children), and only the terminal nodes, are labelled by \nsymbols or atoms. The set of structures S we want to represent is the union of a set \nof atoms and the set of binary trees with terminal nodes labelled by these atoms. \n\nOne way to view a binary tree, by analogy with how we viewed strings above, is as \nhaving a large number of positions with various locations relative to the root: we \nadopt positional roles rill labelled by binary strings (or bit vectors) such as Z = 0110 \nwhich is the position in a tree accessed by \"caddar = car(cdr(cdr(car)))\", that \nis, the left child (0; car) of the right child (1; cdr) of the right child of the left child \nof the root of the tree. Using this role decomposition, each constituent of a tree is \nan atom (the filler) bound to some role rill specifying its location; so if a tree s has \na set of atoms {fi} at respective locations {zih then the vector representing s is \n8 = Ei fi\u00aerXi' \nA more recursive view of a binary tree sees it as having only two constituents: the \natoms or subtrees which are the left and right children of the root. In this fully \nrecursive role decomposition, fillers may either be atoms or trees: the set of possible \nfillers F is the same as the original set of structures S. \n\nThe fully recursive role decomposition can be incorporated into the tensor product \nframework by making the vector spaces and operations a little more complex than \nin (Smolensky 1990). The goal is a representation obeying, Vs, p, q E S: \n\ns = cons(p, q) => 8 = p\u00aerO + q\u00aerl \n\n(1) \nHere, s = cons(p, q) is the tree with left subtree p and right subtree q, while \n8, p and q are the vectors representing s, p and q. The only two roles in this \nrecursive decomposition are ro, rl: the left and right children of root. These roles \nare represented by two vectors rO and rl' \n\nA fully recursive representation obeying Equation 1 can actually be constructed \nfrom the positional representation, by assuming that the (many) positional role \nvectors are constructed recursively from the (two) fully recursive role vectors ac(cid:173)\ncording to: \n\nrxO = rx\u00aerO rxl = rx\u00aerl' \n\nFor example, rOllO = rO\u00aerl \u00aerl \u00aerO' 2 Thus the vectors representing positions \nat depth d in the tree are tensors of rank d (taking the root to be depth 0). As \nan example, the tree s = cons(A, cons(B, e)) = cons(p, q), where p = A and q = \ncons(B, e), is represented by \n\n8 \n\nA\u00aerO + B\u00aerOl + C\u00aerll = A\u00aerO + B\u00aerO\u00aerl + C\u00aerl \u00aerl \nA\u00aerO + (B\u00aerO + C\u00aerl)\u00aerl = p\u00aerO + q\u00aerl, \n\nin accordance with Equation 1. \n\nThe complication in the vector spaces needed to accomplish this recursive analysis \nis one, that allows us to add together the tensors of different ranks representing \ndifferent depths in the tree. All we need do is take the direct sum of the spaces of \ntensors of different rank; in effect, concatenating into a long vector all the elements \n\n'By adopting this definition of rXt we are essentially taking the recursive structure that \nis implicit in the subscripts z labelling the positional role vectors, and mapping it into the \nstructure of the vectors themselves. \n\n\f594 \n\nLegendre, Miyata, and Smolensky \n\nI . h \n\nof the tensors. For example, in S = cons(A, cons(B, C\u00bb, depth 0 is 0, since s isn't an \natom; depth 1 contains A, represented by the tensor S~~1 = AI;'rOP1' and depth 2 \ncontains Band C, represented by S~~IP2 = Bl;'r Opl rlp2 + Cl;'rlpl rl p2 ' The tree \nas a who e IS t en represente by t e sequence s = \nI;'P1P2\"\" were \nthe tensor for depth 0, S~), and the tensors for depths d> 2, S~~l\"'PI.' are all zero. \nWe let V denote the vector space of such sequences of tensors of rank 0, rank I, \n... , up to some maximum depth D which may be infinite. Two elements of V are \nadded (or \"superimposed\") simply by adding together the tensors of corresponding \nrank. This is our vector space for representing trees. a \n\n1;\" SI;'P1' \n\n(1) S(2) \n\n{S(O) \n\nh \n\nd \n\nh \n\n} \n\nThe vector operation cons for building the representation of a tree from that of its \ntwo subtrees is given by Equation 1. As an operation on V this can be written: \n\ncons : ({P~), P~~I' P~J1P2\"\"}, {Q~), Q~~l' Q~~lP2''''}) 1-+ \n} \n\n} {Q(O) \n\nO'PI;' rO P1 'PI;'P1r OP2\"\" + 0, \n{ \n\nI;' rlp1 ,QI;'Pl rl P2\"\" \n\n(1) \n\n(0) \n\n(1) \n\n(Here, 0 denotes the zero vector in the space representing atoms.) In terms of \nmatrices multiplying vectors in V, this can be written \n\ncons(p, q) = W consO p + W consl q \n\n(parallel to Equation 1) where the non-zero elements of the matrix W consO are \n\nW \n\n0 \n\ncons I;'P1P2,,,PI.PI.+l'I;'P1P2\u00b7 .. PI. -\n\n-rO \n\nPHI \n\nand W consl is gotten by replacing ro with rl' \nTaking the car or cdr of a tree - extracting the left or right child - in the recursive \ndecomposition is equivalent to unbinding either \"0 or 7'1. As shown in (Smolensky \n1990, Section 3.1), if the role vectors are linearly independent, this unbinding can be \nperformed accurately, via a linear operation, specifically, a generalized inner product \n(tensor contraction) of the vector representing the tree with an unbinding vector \nUo or ul' In general, the unbinding vectors are the dual basis to the role vectors; \nequivalently, they are the vectors comprising the inverse matrix to the matrix of \nall role vectors. If the role vectors are orthonormal (as in the simulation discussed \nbelow), the unbinding vectors are the same as the role vectors. The car operation \ncan be written explicitly as an operation on V: \n\n{S(O) \n\n(1) \n\n(2) \n\ncar: \n\n1;\" SI;'P' SI;'P1P1 ' . .. .-\n\n} \n\n{E p1 S~~l UOP1 ' E p2 S~~IP2 UOp2 ' E p, S~~lP2P' UOp,' .. -} \n\n3In the connectionist implementation simulated below, there is one unit for each element \nof each tensor in the sequence. In the simulation we report, seven atoms are represented \nby (binary) vectors in a three-dimensional space, so cp = O,1,2j rO and rl are vectors in \na two-dimensional space, so p = 0,1. The number of units representing the portion of V \nfor depth d is thus 3 . 24 and the total number of units representing depths up to D is \n3(2D+l - 1). In tensor product representations, exact representation of deeply embedded \nstructure does not come cheap. \n\n\fDistributed Recursive Structure Processing \n\n595 \n\n(Replacing uo by u1 gives cdr.) The operation car can be realized as a matrix \nW car mapping V to V with non-zero elements: \n\nW car CPPJ P2\"\"PI..CPP1P2\"\"\"PI.PHJ = uOPI.+J\u00b7 \n\nW cdr is the same matrix, with uo replaced by u 1. 4 \nOne of the main points of developing this connectionist representation of trees \nis to enable massively parallel processing. Whereas in the traditional sequential \nimplementation of Lisp, symbol processing consists of a long sequence of car, cdr, \nand cons operations, here we can compose together the corresponding sequence of \nW car, W cdr' W consO and W cons1 operations into a single matrix operation. \nAdding some minimal nonlinearity allows us to compose more complex operations \nincorporating the equivalent of conditional branching. We now illustrate this with \na simple linguistically motivated example. \n\n3 An example \n\nThe symbol manipulation problem we consider is that of transforming a tree rep(cid:173)\nresentation of a syntactic parse of an English sentence into a tree representation of \na predicate-calculus expression for the meaning of the sentence. We considered two \n\npossible syntactic structures: simple active sentences of the form ~ and passive \nsentences of the form~. Each was to be transformed into a tree represent(cid:173)\ning V(A,P), namely v~. Here, the agent & and patient.\u00a3. of the verb V are \n\nboth arbitrarily complex noun phrase trees. (Actually, the network could handle \narbitrarily complex V's as well.) Aux is a marker for passive (eg. is in is feared.) \n\nThe network was presented with an input tree of either type, represented as an \nactivation vector using the fully recursive tensor product representation developed in \nthe preceding section. The seven non-zero binary vectors oflength three coded seven \natoms; the role vectors used were technique described above. The desired output \nwas the same tensorial representation of the tree representing V(A, B). The filler \nvectors for the verb and for the constituent words of the two noun phrases should \nbe unbound from their roles in the input tree and then bound to the appropriate \nroles in the output tree. \n\nSuch transformation was performed, for an active sentence, by the operation \ncons ( cadr( s), cons( car( s), cddr( s))) on the input tree s, and for a passive sentence, \nby cons(cdadr(s), cons(cdddr(s), car(s))). These operations were implemented in \nthe network as two weight matrices, W a and W p' 5 connecting the input units to \nthe output units as shown in Figure 1. In additIon, the network had a circuit for \ntNote that in the caSe when the {rO,rl} are orthonormal, and therefore uo = 1'0, \n\nW car = W consO T i similarly, W cdr = W consl T . \n&The two weight matrices were constructed from the four basic matrices as Wa -\nW consO W car W cdr + W cons1 (W consO W car + W cons1 W cdr W cdr) and Wp = \nW consO W cdr W car W cdr + W consl (W consO W cdr W cdr W cdr + W cons1 W car). \n\n\f596 \n\nLegendre, Miyata, and Smolensky \n\nOutput = cons{V,cons{C,cons(A,B\u00bb) \n\nInput = cons(cons(A,B),cons(cons(Aux,V),cons(by,C)) \n\nFigure 1: Recursive tensor product network processing a passive sentence \n\ndetermining whether the input sentence was active or passive. In this example, it \nsimply computed, by a weight matrix, the caddr of the input tree (where a passive \nsentence should have an Aux), and if it was the marker Aux, gated (with sigma-pi \nconnections) W p , and otherwise gated Wa. \nGiven this setting, the network was able to process arbitrary input sentences of \neither type, up to a certain depth (4 in this example) limited by the size of the \nnetwork, properly and generated correct case role assignments. Figure 1 shows the \nnetwork processing a passive sentence \u00abA.B).\u00abAux.V).(by.C))) as in All connection(cid:173)\nist, are feared by Minsky and generating (V.(C.(A.B\u00bb) as output. \n\n4 Discussion \n\nThe formalism developed here for the recursive representation of trees generates \nquite different representations depending on the choice of the two fundamental role \nvectors rO and rl and the vectors for representing the atoms. At one extreme is \nthe trivial fully local representation in which one connectionist unit is dedicated \nto each possible atom in each possible position: this is the special case in which \nrO and rl are chosen to be the canonical basis vectors (1 0) and (0 I), and the \nvectors representing the n atoms are also chosen to be the canonical basis vectors \nof n-space. The example of the previous section illustrated the case of (a) linearly \ndependent vectors for atoms and (b) orthonormal vectors for the roles that were \n\"distributed\" in that both elements of both vectors were non-zero. Property (a) \npermits the representation of many more than n atoms with n-dimensional vectors, \nand could be used to enrich the usual notions of symbolic computation by letting \n\"similar atoms\" be represented by vectors that are closer to each other than are \n\"dissimilar atoms.\" Property (b) contributes no savings in units of the purely \nlocal case, amounting to a literal rotation in role space. But it does allow us \n\n\fDistributed Recursive Structure Processing \n\n597 \n\nto demonstrate that fully distributed representations are as capable as fully local \nones at supporting massively parallel structure processing. This point has been \ndenied (often rather loudly) by advocates oflocal representations and by such critics \nas (Fodor & Pylyshyn 1988) and (Fodor & McLaughlin 1990) who have claimed \nthat only connectionist implementations that preserve the concatenative structure \nof language-like representations of symbolic structures could be capable of true \nstructure-sensitive processing. \n\nThe case illustrated in our example is distributed in the sense that all units corre(cid:173)\nsponding to depth d in the tree are involved in the representation of all the atoms \nat that depth. But different depths are kept separate in the formalism and in the \nnetwork. We can go further by allowing the role vectors to be linearly dependent, \nsacrificing full accuracy and generality in structure processing for representation of \ngreater depth in fewer units. This case is the subject of current research, but space \nlimitations have prevented us from describing our preliminary results here. \n\nReturning to Harmonic Grammar, the next question is, having developed a fully \nrecursive tensor product representation for lower-level representation of embedded \nstructures such as those ubiquitous in syntax, what are the implications for well(cid:173)\nformedness as measured by the harmony function? A first approximation to the \nnatural language case is captured by context free grammars, in which the well(cid:173)\nformedness of a subtree is independent of its level of embedding. It turns out that \nsuch depth-independent well-formed ness is captured by a simple equation governing \nthe harmony function (or weight matrix). At the higher level where grammatical \n\"rules\" of Harmonic Grammar reside, this has the consequence that the numerical \nconstant appearing in each soft constraint that constitutes a \"rule\" applies at all \nlevels of embedding. This greatly constrains the parameters in the grammar. \n\nReferences \n\n[1] J. A. Fodor and B. P. McLaughlin. Connectionism and the problem of system(cid:173)\naticity: Why smolensky's solution doesn't work. Cognition, 35:183-204, 1990. \n\n[2] J. A. Fodor and Z. W. Pylyshyn. Connectionism and cognitive architecture: A \n\ncritical analysis. Cognition, 28:3-71, 1988. \n\n[3] G. Legendre, Y. Miyata, and P. Smolensky. Harmonic grammar - a formal \n\nmulti-level connectionist theory of linguistic well-formedness: Theoretical foun(cid:173)\ndations. In the Proceeding. of the twelveth meeting of the Cognitive Science \nSociety, 1990a. \n\n[4] G. Legendre, Y. Miyata, and P. Smolensky. Harmonic grammar - a formal multi(cid:173)\n\nlevel connectionist theory of linguistic well-formedness: An application. In the \nProceedings of the twelveth meeting of the Cognitive Science Society, 1990b. \n\n[5] P. Smolensky. Tensor product variable binding and the representation of sym(cid:173)\n\nbolic structures in connectionist networks. Artificial Intelligence, 46:159-216, \n1990. \n\n\f", "award": [], "sourceid": 406, "authors": [{"given_name": "Geraldine", "family_name": "Legendre", "institution": null}, {"given_name": "Yoshiro", "family_name": "Miyata", "institution": null}, {"given_name": "Paul", "family_name": "Smolensky", "institution": null}]}