{"title": "HIGH DENSITY ASSOCIATIVE MEMORIES", "book": "Neural Information Processing Systems", "page_first": 211, "page_last": 218, "abstract": "", "full_text": "HIGH DENSITY ASSOCIATIVE MEMORIES! \n\n211 \n\nInformation Systems Laboratory, Stanford University \n\nA\"'ir Dembo \n\nStanford, CA 94305 \n\nOfer Zeitouni \n\nLaboratory for Information and Decision Systems \n\nMIT, Cambridge, MA 02139 \n\nABSTRACT \n\nfrom a description of desired properties \n\nA class of high dens ity assoc iat ive memories \n\nis constructed, \nstarting \nshould \nexhib it. These propert ies include high capac ity, controllable bas ins \nof attraction and fast speed of convergence. Fortunately enough, the \nresulting memory is implementable by an artificial Neural Net. \n\nthose \n\nI NfRODUCTION \n\nMost of the work on assoc iat ive memories has been structure \noriented, i.e.. given a Neural architecture, efforts were directed \ntowards the analysis of the resulting network. \nIssues like capacity, \nbasins of attractions, etc. were the main objects to be analyzed cf., \ne.g. [1], [2], [3], [4] and references there, among others. \n\nthis paper, we \n\ntake a different approach, we start by \nexplicitly stating the desired properties of the network, in terms of \ncapacity, etc. Those requirements are given in terms of axioms (c.f. \nbelow). Then, we bring a synthesis method which enables one to design \nan \nperformance. \nSurprisingly enough, it turns out that one gets rather easily the \nfollowing properties: \n\narchitecture which will \n\ndesired \n\nyield \n\nthe \n\nIn \n\n(a) High capacity (unlimited in the continuous state-space case, \nbounded only by sphere-packing bounds in the discrete state \ncase). \n\n(b) Guaranteed basins of attractions \n\nin terms of the natural \n\nmetric of the state space. \n\n(c) High speed of convergence \n\nin \n\nthe guaranteed basins of \n\nattraction. \n\nMoreover, it turns out that the architecture suggested below is the \nonly one which satisfies all our axioms (-desired properties-)I \n\nto construct such a potential \n\nOur approach is based on defining a potential and following a \ndescent algorithm (e.g., a gradient algorithm). The main design task \nis \nlesser extent, an \nimplementat ion of the descent algorithm via a Neural network). \nIn \ndoing so, it turns out that, for reasons described below, it is useful \nto regard each des ired memory locat ion as a -part icle-\nin the state \nspace. It is natural to require now the following requirement from a \n\nto a \n\n(and, \n\nIAn expanded version of this work has been submitted to Phys. Rev. A. \nThis work was carried out at the Center for Neural Sc ience, Brown \nUniversity. \n\n\u00a9 American Institute of Physics 1988 \n\n\f(Pl) The potential should be linear w.r.t. adding partic les in \nthe sense that the potential of two particles should be the sum of the \npotentials induced by the individual particles (i.e \u2022\u2022 we do not allow \ninterparticles interaction). \n\n(P2) Part icle locat ions are the only poss ible sites of stable \n\n.emory locations. \n\n(P3) \n\nrotations of the coordinates. \n\nThe \n\nsystem \n\nshould be \n\ninvariant \n\nto \n\ntranslations \n\nand \n\n212 \n\n.eJlOry: \n\nWe note that the last requirement is made only for the sake of \nsimplicity. It is not essential and may be dropped without affecting \nthe results. \n\nIn the sequel. we construct a potential which satisfies the above \nrequirements. We refer the reader to [5] for details of the proofs. \netc. \nAcknowledgements. We would like to thank Prof. L.N. Cooper and C.M. \nBachmann for many fruitful discussions. \nIn particular. section 2 is \npart of a joint work with them ([6]). \n\n2. HIGH DENSIlY STORAGE MODEL \n\nIn what follows we present a particular case of a method for the \nconstruct ion of a high storage dens ity neural memory. We define a \nfunction with an arbitrary number of minima that lie at preassigned \npoints and define an appropriate relaxat ion procedure. The general \ncase in presented in [5]. \n\nLet i 1 ..... i m be a set of m arb itrary d ist inct memories in RN. \n\nThe \u00b7energy\u00b7 function we will use is: \n\nm \n\n~ = - i 2 Qi Iii - i i I-L \n\ni=l \n\n(1) \n\nwhere we assume throughout that N ~ 3. L ~ (N - 2). and Qi > 0 and use \n1 \u2022\u2022\u2022 1 to denote the Euclidean distance. Note that for L = 1. NF3. ~ \nis the electrostat ic potent ial \ninduced by negat ive fixed part ic les \nwith charges -Qi. This \u00b7energy\u00b7 funct ion possesses global minima at \ni 1 \u2022\u2022\u2022\u2022\u2022 i m (where ~(ii) .. -) and has no local minima except at these \npoints. A rigorous proof \ntogether with the \ncomplete characterization of functions having this property. \n\nis presented \n\n[5] \n\nin \n\nAs a relaxation procedure. we can choose any dynamical system for \nwhich ~ is strictly decreasing. uniformly \nthis \ninstance. the theory of dynamical systems guarantees that for almost \nany initial data. the trajectory of the system converges to one of the \ndesired points i 1 \u2022\u2022\u2022\u2022\u2022 i m\u2022 However. to give concrete results and to \nfurther exploit \nthe \nrelaxation: \n\nto electrostatic. consider \n\nin compacts. \n\nresemblance \n\nthe \n\nIn \n\n\f213 \n\n(2) \n\ni=1 \n\n. .. -= E-\n\nII \n\nIl -= -\n\nwhere for N=3. L=1. equation (2) describes the motion of a positive \nt~st particle in the electrost!tic f~eld ~ generated by the negative \nf1xed charges -Q1 \u2022\u2022\u2022\u2022 L -~ at xl \u2022\u2022\u2022\u2022\u2022 xm\u2022 \nSince the field E;i is just minus the gradient of e. it is clear \nthat along trajectories of (2). de/dt ~ O. with equality only at the \nfbed points of (2). which are exactly the stat ionary po ints of e. \nTherefore. using (2) as the relaxation procedure. we can conclude \nthat entering at any ~(O). the system converges to a stationary point \ninto m domains of \nof e. \nattraction. each one corresponding to a different memory. and \nthe \nboundaries (a set of measure zero). on which p(O) will converge to a \nsaddle point of e. \n\nis partitioned \n\nspace of \n\ninputs \n\nThe \n\nWe can now explain why e~ has no spurious local minima. at least \nfor L=1. N=3. using elementary physical arguments. \nSuppose e has a \nspurious local minima at y ~ xl \u2022\u2022\u2022\u2022\u2022 xm\u2022 then in a s!!all neighborhood \nof y which does not include any of the xi. the field ~ points towards \ny. Thus. on any closed surface in that neighborhood. the integral of \nthe normal inward component of ~ is positive. However. this integral \nis just the total charge included inside the surface. which is zero. \nThus we arrive at a contradiction. so y can not be a local minimum. \n\nWe now have a relaxation procedure. such that almost any ~(O) is \nattracted by one of the xi. but we have not yet spec ified the shapes \nof the basins of attraction. By varying \nthe charges Qi. we can \nenlarge one basin of attraction at the expense of the others (and vice \nversa). \n\nEven when all of the Qi are eqmal. the position of the xi might \ncause ~(O) not to converge to the closest memory. as emphasized in the \nexample \nthe \n\nlet r = min1~i~j~mlxi -\n\nin fig. 1. However. \n\ni j 1 be \n\nminimal distance between any two memoriesJ then if I~(O) - ii I~ ,[ \u2022 .,lIk) \n\nit can be shown that ~(O) will converge to xi. (provided that k \n\nL +! = - -\nN+i \n\n[2R/r + 1]N. tlien bounding the magnitude of the field induced 'I. any \n\n11). Thus. if thamemories are densely packed in a hypersphere. by \nchoosing k large enough (i.e. enlarging the parameter L). convergence \nto the closest memory for any -interesting-\ninput. that is an input \nis guaranteed. The detailed \ni;:(O) with a distinct closest memory. \nproof of the above property is given in [5]. \nIt is based on bound ing \nthe number of x j \u2022 j~i. in a hypersphere of radius R(Rlr) around xi. by \nXj. j~i. on the boundar, of such a hypersphere by (R-li;:(O)-xiP- +1). \nand finally integrat ing to show that for I~(O)-ii 15. (i~~I/~ ,with e<1. \nthe convergence of ~(O) to xi is within finite time T. which behaves \nlike e L+2 for L \u00bb 1 and e < 1 and fixed. \nIntuitively the reason for \n\n\fthe short-range nature of \n\nis \nBecause of \nfor inputs ~(O) far away from all of the xi. \n\nalso expect extremely \n\nthe fields used \n\nthis. we \n\nin \nlow \n\nis \n\nthat \n\nthis difficulty. \n\nThe radial nature of these fields suggests a \nway \nto overcome \nto \nincrease the convergence rate from points very far \naway. without disturbing all of the aforementioned \ndesirable properties of the model. Assume that we \nknow in advance that all of the xi lie inside some \nlarge hypersphere S around the origin. Then. at \nany point ~ outside S. the field ~ has a positive \nlong(cid:173)\nprojection radially into S. By adding a \nrange force to B-. effective only outside of S. we \ncan hasten the mgvement towards S. from points far \naway, without creating additional minima inside of \nS. As an example the force (-~ for ji , S, 0 for \nto \nji \nthe boundary of S within the small finite time T ~ \n1/1SI. and \nthen on the system wil} behave \ninside S according to the original field Bu. \nUp to this point. our derivations have been \nfor a continuous system. but from it we can deduce \na discrete system. We shall do this mainly for a \nclearer comparison between our high density memory \nmodel \nthe discrete version of Hopfield's \nmodel. Before continuing in that direction. note \nthat our continuous system has unl imited storage \ncapacity unlike Hopfield's \nsystem. \nwhich \nlimited \ncapac ity. \n\nlike his discrete model, \n\ncontinuous \nhas \n\nS) will pull \n\ninput \n\ntest \n\nfrom \n\nji(O) \n\nany \n\nand \n\n8 \n\n214 \n\nthis behaviour \nequat ion \n(2) \u2022 \nconvergence rate \n\n'. \n\n,I \n\nI \" \n\nFigure 1 \n\nR \u00bb I and 0 \u00ab 1 \n\nreplace \n\nFor the discrete system, assume \n\nelements \u00b11 and \nthe Euclid\\an dJstance \nnormal ized Hamming 4 istance lii1 - ~21 = 1; I '=1111~ -\nthe vec tors :i i on the un it hypersphere. \n\nthat the Xi are composed of \nthe \nin \n11~ I. This places \nJ \nThe relaxation process for the discrete system will be of the \nChoose at random a \nthat \niii = 2/N). calculate the \"energy\" difference. r.e = ~(ii~-~(ii). \n\ntype defined in Hopfield's model in [11 \ncomponent \nIii' -\nand only if r.e < O. change this component, that is: \n\nis, a neighbor ~' of \n\nto be updated (that \n\n(1) with \n\nsuch \n\nii \n\nJ \n\nJ \n\n11\u00b7 ~f.l. sign(~(~~ - ~(ji\u00bb, \n\n1 \n\n1 \n\n(3) \n\nwhere e(ii) is the potent ial energN in (1). Since there is a finite \nnumber of possible ~ vectors (2), convergence \nis \nguaranteed. \n\nin finite \n\ntime \n\nThis relaxation procedure is rigid since the movement is limited \nto points with components +1. Therefore. although the local minima of \n~(ii) defined in (2) are only at the desired points Xi' the relaxation \nmay get stuck at some \nii which is not a stationary point of ~(ii). \nHowever, the short range behaviour of the potential e(~), unlike the \nlong-range behavior of the quadratic potential used by Hopfield, gives \n\n\f215 \n\nrise to results similar to those we have quoted for the continuous \nll10del (equation (1\u00bb. \n\nSpecifically. let the stored me~ories i 1 \u2022\u2022\u2022\u2022\u2022 i m be separated from \none another by having at least pN different components (0 < p i 1/2 \nand p fixed), and let ~(O) agree up to at least one ii with at most \nepN errors between \njHO) \nconverges monotonically to i i by the relaxat ion procedure given in \nequat ion (3). \n\nthem (0 i e < 1/2. with e \n\nfixed), \n\nthen \n\nThis result holds independently of m. provided that N is large \n\nenough (typically. Np In(1~e) L 1) and L is chosen so that f i In(!~e) \n\nThe proof is constructed by bounding the cummulative effect of terms \nI~ - ii rL. j;&i. to t~e energy difference Se and showing that it is \ndominafed by I~ - ii 1 L. For details. we refer the reader again to \n[5]. \n\nNote the importance of this property: unlike the Hopfield model \nwhich is limited to miN. the suggested system is optimal \nin the \nsense of Information Theory. since for every set of memories i 1 \u2022\u2022\u2022\u2022\u2022 i m \nseparated from each other by a Hamming distance pN. up to 1/2 pN \nerrors in the input can be corrected. provided that N is large and L \nproperly chosen. \n\nAs for the complexity of the system. we note that the nonlinear \noperat ion a -L. for a}O and L integer (which is at (the heart of our \nsystem computationally)' \ncan be \ntherefore. by a simple electrical circuit composed of \nimplemented. \ndiodes. which have exponential \nand \nresistors. which can carry out the necessary multiplications (cf. the \nimplementation of section 3). \n\ninput-output characteristics. \n\nis equivalent \n\nto e-Lln a) \n\nand \n\nFurther. since both liil and I~I are held fixed in the discrete \nis \n\nsystem. where all states are on the unit hypersphere. \nequivalent to the inner product of ~ and ii' up to a constant. \n\nI~ - ii 12 \n\nTo \n\nthe \n\nconclude. \n\nsuggested model \n\nfollowed by m nonlinear operations. and \n\nabout m'N \nmultiplications. \nthen m'N \nadditions. The original model of Hopfield involves Nf multiplications \nis limited to \nand additions. and \nmiN. Therefore. whenever \nthe \ncomplexity of both ll10dels is comparable. \n\nthen N nonlinear operations. but \n\nthe Hopfield model \n\nis applicable \n\ninvolves \n\n3. \n\nIMPLEMENI'ATION \n\nWe propose below one possible network which \n\nthe \nimplements \ndiscrete time and space version of the model described above. \nAn \nimplementation for the ocntinuous time case. which is even simpler. is \nalso hinted. We point out that the implementation described below is \nby no means unique. (and maybe even not the simplest one). Moreover. \nthe -neurons\u00b7 used are artificial neurons which perform various tasks. \nas follows: There are (N+1) neurons which are delay elements. and \\'l'\\. \npOintwise non-linear functions \nless. \nbetween those two layers of neurons. \n\n(which may be \nThere are ~N synaptic connections \nIn addition. as in the Hopfield \n\nintermediate neurons). \n\ninterpreted as delay(cid:173)\n\n\f216 \n\nmodel, we have at each iteration to specify (either deterministically \nor stochastically) which coordinate are we updating. To do that, we \nuse an N dimensional \u00b7control register\u00b7 whose content is always a unit \nvector of {O, \nl}N (and the location of the '1' will denote the next \ncoordiante to be changed). This vector may be varied from instant n \nto n + 1 either by shift (\u00b7sequential coordinate update\u00b7) or at \nrandom. \n\nthe \n\n(N+1) \n\nI!eurons \n\nLet Ai' UUN be the i-th output of the \u00b7co1!,trol\u00b7 register, xi' \nl~UN and V be \nthe \ncorresponding outputs (where xi' xi8{+1,-1), Ai 8{0,1}, but V is a real \ninput of the j-th inte;medi~te neuron \nnumber), _j' l~j~ be \n(-1~_ ~1), ~j = -(1-_ )-L be its output, \n'ji = uij IN be the \nth synapsis, where u~j) refers here to the \nsynaptiC weight of thJ ij -\ni-th element of the j-th memory. \nThe system's equations are: \n\ninputs and xi = xi (l-2A i ) \n\nthe \n\nand \n\n1 < i ~ N \n\n1 ~ j < m \n\n~ \"\" -(1 __ )-L \n\nj \n\nj \n\nS = i\"(l-sign(V - V\u00bb \n\n1 \n\n-\n\n1 < i ~ N \n\nV ~V + SV \n\n(4a) \n\n(4b) \n\n(4c) \n\n(4d) \n\n(4e) \n\n(4f) \n\n(4g) \n\nsystem is initialized by xi = xi (0) (the probe vector), and \nA block diagram of this sytem appears in Fig. 2. Note that \n\nThe \nV = + CD. \nwe made \nAs \nsphere) \n\nuse of N + m + 1 neurons and O(Nm) connections. \nfor \nwe will get the equations: \n\nthe continuous \n\ntime case \n\n(with memories on \n\nthe unit \n\n\f-\n\nXi + 2m VX i = \n\nm \n\nLN 2 \n\n\"jil1 j. \n\nj=l \n\n-j = \n\nN 2 \" . i X .\u2022 \n\nJ \n\n1 \n\n6 2 2 \nx .\u2022 \n= \n\n1 \n\nN \n\ni .. l \n\nN \n\ni=l \n\nl1j = (1 + 6 - 2_ j) \n\n_(L + 1) \n\u2022 \n\n'I\" \n\n~ \n\n- 2 \n\nV = \n\nj=l \n\nl1j \n\n217 \n\n1 ~ i ~ N \n\n(Sa) \n\n1 ~ j ~ m \n\n(Sb) \n\n1 < j ~ m \n\n(Sc) \n\n(Sd) \n\nwith similar interpretation (here there is no \nall components are updated continuously). \n\n'control' register as \n\ns \n\ni _~o Synoptic Switch (0 =Zi, \nt \n\nfc \n\nc =0) \nc = I \n\nComputation UnIt (0= 1/2(1-sign(i2-i, Il) \n\nLegend \n\n@] \n\nDeloy Unit (Neuron) \n\nSynoptic Switch 0= \n\n( \n\nc=O) \n\n{ , \n.. \n'2 C = I \n\nFigure 2 Neural Network Implementotion \n\n\f218 \n\n1. \n\nREFERENCES \n\n1.1. Bopfield. \nand Physical Systems with \nEmergent Collective Computational Abilities-. Proc. Nat. Acad. \nSci. U.S.A \u2022\u2022 Vol. 79 (1982). pp. 2554-2558. \n\n-Neural Networks \n\n2. R.I. McEliece. et al \u2022\u2022 -The Capacity of the Hopfield Associative \nMemory-. IEEE\" Trans. on Inf. Theory. Vol. IT-33 (1987). pp. 461-\n482. \n\n3. A. Dembo. \n\n-On the Capac ity of the Hopfield Memory-. submitted. \n\nIEEE Trans. on Inf. Theory. \n\n4. Kohonen. T \u2022\u2022 Self Organization and Associative Memory. Springer. \n\n5\". \n\n6. \n\nBerlin. 1984. \nDembo. A. and Ze itouni. 0 \u2022\u2022 General Potent ial Surfaces and Neural \nNetworks. submitted. Phys. Rev. A. \nBachmann. C.M.. Cooper. L.N., Dembo. A. and Zeitouni. 0.. A \nrelazation Model for Memory with high storage density. to appear. \nProc. Natl. Ac. Science. \n\n\f", "award": [], "sourceid": 73, "authors": [{"given_name": "Amir", "family_name": "Dembo", "institution": null}, {"given_name": "Ofer", "family_name": "Zeitouni", "institution": null}]}