{"title": "A Lagrangian Approach to Fixed Points", "book": "Advances in Neural Information Processing Systems", "page_first": 77, "page_last": 83, "abstract": null, "full_text": "A Lagrangian Approach to Fixed Points \n\nEric Mjolsness \nDepartment of Computer Science \nYale University \nP.O. Box 2158 Yale Station \nNew Haven, CT 16520-2158 \n\nWillard L. Miranker \nIBM Watson Research Center \nYorktown Heights, NY 10598 \n\nAbstract \n\nWe present a new way to derive dissipative, optimizing dynamics from \nthe Lagrangian formulation of mechanics. It can be used to obtain both \nstandard and novel neural net dynamics for optimization problems. To \ndemonstrate this we derive standard descent dynamics as well as nonstan(cid:173)\ndard variants that introduce a computational attention mechanism. \n\n1 \n\nINTRODUCTION \n\nNeural nets are often designed to optimize some objective function E of the current \nstate of the system via a dissipative dynamical system that has a circuit-like imple(cid:173)\nmentation. The fixed points of such a system are locally optimal in E. In physics the \npreferred formulation for many dynamical derivations and calculations is by means \nof an objective function which is an integral over time of a \"Lagrangian\" function, \nL. From Lagrangians one usually derives time-reversable, non-dissipative dynamics \nwhich cannot converge to a fixed point, but we present a new way to circumvent \nthis limitation and derive optimizing neural net dynamics from a Lagrangian. We \napply the method to derive a general attention mechanism for optimization-based \nneural nets, and we describe simulations for a graph-matching network. \n\n2 LAGRANGIAN FORMULATION OF NEURAL \n\nDYNAMICS \n\nOften one must design a network with nontrivial temporal behaviors such as run(cid:173)\nning longer in exchange for less circuitry, or focussing attention on one part of a \n\n77 \n\n\f78 Mjolsness and Miranker \n\nproblem at a time. In this section we transform the original objective function (c.f. \n(Mjolsness and Garrett, 1989]) into a Lagrangian which determines the detailed dy(cid:173)\nnamics by which the objective is optimized. In section 3.1 we will show how to add \nin an extra level of control dynamics. \n\n2.1 THE LAGRANGIAN \n\nReplacing an objective E with an associated Lagrangian, L, is an algebraic trans(cid:173)\nformation: \n\nL[v, vlq] = K[v, vlq] + ~~. \nThe \"action\" 8 = Joo Ldt is to be extremized in a novel way: \n\nE[v] \n\n-\n\n-00 \n\n(2) \nIn (1), q is an optional set of control parameters (see section 3.1) and K is a cost(cid:173)\nof-movement term independent of the problem and of E. For one standard class of \nneural networks, \n\n(1) \n\n(3) \n\n(4) \n\nE[v] = -(1/2) L TijViVj - L hivi + L \u00a2i(Vi) \n\nij \n\nso \n\n- 8E/8vi = L TijVj + hi - g-l(Vi), \nwhere g-l(v) = \u00a2'(v). Also dE/dt is of course Ei(8E/8vi)Vi. \n\nj \n\n2.2 THE GREEDY FUNCTIONAL DERIVATIVE \n\nIn physics, Lagrangian dynamics usually have a conserved total energy which pro(cid:173)\nhibits convergence to fixed points. Here the main difference is the unusual func(cid:173)\ntional derivative with respect to v rather than v in equation (2). This is a \"greedy\" \nfunctional derivative, in which the trajectory is optimized from beginning to each \ntime t by choosing an extremal value of v(t) without considering its effect on any \nsubsequent portion of the trajectory: \n\n6 1t d'L[' \n\n~()8L[v,v] ~() 6 100 d' [ ' ] \n\n-00 t v, v ~ u 0 8Vi(t) = u 0 6Vi(t) \n\n6Vi(t) \n\n-00 t L v, v oc 6Vi(t)' \n\n68 \n\n() \n5 \n\n] \n\nSince \n\n(6) \nequations (1) and (2) preserve fixed points (where 8E/8vi = 0) if 8K/8vi = 0 \u00a2} \nv = o. \n\n8L \n\n68 \n8E \n6Vi = 8Vi = 8Vi + 8Vi' \n\n8K \n\n2.3 STEEPEST DESCENT DYNAMICS \nFor example, with K = Ei \u00a2(vdr) one may recover and generalize steepest-descent \ndynamics: \n\nE[v] -\n\nL[vlr) = 4= \u00a2(vdr) + 4= ~~ Vi, \n\n\u2022 \n\n\u2022 \n\n(7) \n\n\fA Lagrangian Approach to Fixed Points \n\n79 \n\n.' \n\n,t.'::. . . \n, . . . . \n\n(a) \n\n(b) \n\nt \n\nFigure 1: (a) Greedy functional derivatives result in greedy optimization: the \n\"next\" point in a trajectory is chosen on the basis of previous points but not future \nones. (b) Two time variables t and T may increase during nonoverlapping inter(cid:173)\nvals of an underlying physical time variable, T. For example t = J dT(h(T) and \nT = J dT<. :! k.] = -6 [~ (~>'.Vg,(U') ;~ y] , \n\n(21) \nwhere 8E/8vi = 2:j 71jvj + hi - Ui. If we assume that Ecost favors fixed points \nfor which ria ~ 0 or 1 and 2:i ria ~ 0 or 1, there is a fixed-point-preserving \ntransformation of (21) to \n\nEb~eftt(r) = -6 [~r,.9'( U;)(;:')2] . \n\n(22) \n\nThis is monotonic in a linear function of r. It remains to specify Ecost and a kinetic \nenergy term [(. \n\n3.2 \n\nINDEPENDENT VIRTUAL NEURONS \n\nFirst consider independent ria. As in the Tank-Hopfield [Tank and Hopfield, 1986] \nlinear programming net, we could take \n\nThus the r dynamics just sorts the virtual neurons and chooses the A neurons \nwith largest g' (ui)8 E / 8v, . For dynamics, we introduce a new time variable T that \n\n\f82 Mjolsness and Miranker \n\nmay not even be proportional to t (see figure 1 b) and imitate the Lagrangians for \nHopfield dynamics: \n\n' \" 1 (dPi a ) 2 \n\nL = ~ 2 dr \n\nsa \n\n, \n9 (Pi) + dr Ebeneflt + ECOBt \n\n~) \n; \n\nd (_ \n\n(24) \n\n3.3 JUMPING WINDOW OF ATTENTION \n\nA far more cost-effective net involves partitioning the virtual neurons into real-net(cid:173)\nsized blocks indexed by a, so i -+ (a, a) where a indexes neurons within a block. \nLet XQ E [0,1] indicate which block is the current window or focus of attention, i.e. \n\nUsing (22), this implies \n\nEbeneflt[x] = -b [Z:XQ Z:g'(UQa)(8~E )2] , \n\nQ \n\na \n\nQa \n\nand \n\n(26) \n\n(27) \n\n(28) \n\nSince ECOBt here favors LQ XQ = 1 and XQ E {O, I}, Ebeneflt has the same fixed \npoints as, and can be replaced by, \n\n(29) \n\nThen the dynamics for X is just that of a winner-take-all neural net among the blocks \nwhich will select the largest value of b[La g'(uQa )(8E/8vQa)2]. The simulations of \nSection 4 report on an earlier version of this control scheme, which selected instead \nthe block with the largest value of La 18E/8vQa l. \n\n3.4 ROLLING WINDOW OF ATTENTION \n\nHere the r variables for a neural net embedded in a d-dimensional space are deter(cid:173)\nmined by a vector x representing the geometric position of the window. ECOBt can be \ndropped entirely, and E can be calculated from r(x). Suppose the embedding is via \na d-dimensional grid which for notational purposes is partitioned into window-sized \nsquares indexed by integer-valued vectors 0: and a. Then \n\nwhere \n\n8w(x) \n----'--'- = \n8x~ \n\n{6[1/4 - (xp + L)2] \n6[(x~ - L)2 - 1/4] \n0 \n\nif \nif \n\notherwise \n\n-1/2 $xp+L< 1/2 \n-1/2 $ x~ - L < 1/2 \n\n(31) \n\n(30) \n\n\fA Lagrangian Approach to Fixed Points \n\n83 \n\nand \n\nE[x] = -b [z: w(Lo: + a - X)g'(uo:a)(8~E )2] . \n\no:a \n\no:a \n\n(32) \n\nThe advantage of (30) over, for example, a jumping or sliding window of attention \nis that only a small number of real neurons are being reassigned to new virtual \nneurons at anyone time. \n\n3.4.1 Dynamics of a Rolling Window \n\nA candidate Lagrangian is \n\nL[x] = ! '\" (dXp.) 2 + '\" 8E dxp. , \n\n2 L...J \nP. \n\ndT \n\nL...J 8x P. dT \np. \n\n(33) \n\nwhence greedy variation hS/hz = 0 yields \n\ndX JJ = _ [2: 8w(x - Lo: - a) g'(Uo:a)( 8E )2] X b' [2: wg'(Uo:a)( 8E )2] \n\nOVo:a \n\no:a \n\n8vo:a \n\ndT \n\no:a \n\nOX JJ \n\n(34) \nWe may also calculate that the linearized dynamic's eigenvalues can be bounded \naway from infinity and zero. \n\n4 SIMULATIONS \n\nA jumping window of attention was simulated for a graph-matching network in \nwhich the matching neurons were partitioned into groups, only one of which was \nactive (ria = 1) at any given time. The resulting optimization method produced \nsolutions of similar quality as the original neural network, but had a smaller re(cid:173)\nquirement for computational space resources at any given time. \n\nAcknowledgement: Charles Garrett performed the computer simulations. \n\nReferences \n\n[Mjolsness, 1987] Mjolsness, E. (1987) . Control of attention in neural networks. In \nProc. of First International Conference on Neural Networks, volume vol. II, pages \n567-574. IEEE. \n\n[Mjolsness and Garrett, 1989] Mjolsness, E. and Garrett, C. (1989). Algebraic \ntransformations of objective functions. Technical Report YALEU/DCS/RR686, \nYale University Computer Science Department. Also, in press for Neural Net(cid:173)\nworks. \n\n[Tank and Hopfield, 1986] Tank, D. W. and Hopfield, J. J. (1986). Simple 'neural' \noptimization networks: An aid converter, signal decision circuit, and a linear \nprogramming circuit. IEEE Transactions on Circuits and Systems, CAS-33 . \n\n\f", "award": [], "sourceid": 398, "authors": [{"given_name": "Eric", "family_name": "Mjolsness", "institution": null}, {"given_name": "Willard", "family_name": "Miranker", "institution": null}]}