{"title": "An Attractor Neural Network Model of Recall and Recognition", "book": "Advances in Neural Information Processing Systems", "page_first": 642, "page_last": 648, "abstract": null, "full_text": "An Attractor Neural Network Model of Recall \n\nand Recognition \n\nEytan Ruppin \n\nDepartment of Computer Science \nSchool of Mathematical Sciences \nSackler Faculty of Exact Sciences \nTel Aviv University \n69978, Tel Aviv, Israel \n\nYechezkel Yeshurun \n\nDepartment of Computer Science \nSchool of Mathematical Sciences \nSackler Faculty of Exact Sciences \n\nTel Aviv University \n\n69978, Tel Aviv, Israel \n\nAbstract \n\nThis work presents an Attractor Neural Network (ANN) model of Re(cid:173)\ncall and Recognition. It is shown that an ANN model can qualitatively \naccount for a wide range of experimental psychological data pertaining \nto the these two main aspects of memory access. Certain psychological \nphenomena are accounted for, including the effects of list-length, word(cid:173)\nfrequency, presentation time, context shift, and aging. Thereafter, the \nprobabilities of successful Recall and Recognition are estimated, in order \nto possibly enable further quantitative examination of the model. \n\n1 Motivation \n\nThe goal of this paper is to demonstrate that a Hopfield-based [Hop82] ANN model \ncan qualitatively account for a wide range of experimental psychological data per(cid:173)\ntaining to the two main aspects of memory access, Recall and Recognition. Recall \nis defined as the ability to retrieve an item from a list of items (words) originally \npresented during a previous learning phase, given an appropriate cue (cued RecalQ, \nor spontaneously (free RecalQ. Recognition is defined as the ability to successfully \nacknowledge that a certain item has or has not appeared in the tutorial list learned \nbefore. \n\nThe main prospects of ANN modeling is that some parameter values, that in former, \n'classical' models of memory retrieval (see e.g. [GS84]) had to be explicitly assigned, \ncan now be shown to be emergent properties of the model. \n\n642 \n\n\fAn Attractor Neural Network Model of Recall and Recognition \n\n643 \n\n2 The Model \n\nThe model consists of a Hopfield ANN, in which distributed patterns representing \nthe learned items are stored during the learning phase, and are later presented as \ninputs during the test phase. In this framework, successful Recall and Recognition \nis defined. Some additional components are added to the basic Hopfield model to \nenable the modeling of the relevant psychological phenomena. \n\n2.1 The Hopfield Model \n\nThe Hopfield model's dynamics are composed of a non-linear, iterative, asyn(cid:173)\nchronous transformation of the network state [Hop82]. The process may include \na stochastic noise which is analogous to the 'temperature' T in statistical mechan(cid:173)\nics. Formally, the Hopfield model is described as follows: Let neuron's i state be a \nbinary variable Si, taking the values \u00b1 1 denoting a firing or a resting state, corre(cid:173)\nspondingly. Let the network's state be a vector S specifying the binary values of \nall its neurons. Let Jij be the synaptic strength between neurons i and j. Then, \nhi, the input 'field' of neuron i is given by hi = L:f# JijSj. The neuron's dynamic \nbehavior is described by \n\nSi (t + 1) = { 1, \n\n-1, \n\nwith probability ~(1 + tgh( ~\u00bb \nwith probability ~(1 - tgh( ~\u00bb \n\nStoring a new memory pattern eJ.' in the network is performed by modifying every \nij element of the syna.ptic connection matrix according to JlY w = Ji1d + -keJ.' ieJ.' j. \nA Hopfield network will always converge to a stable state, and every stored memory \nis an attractor having an area surrounding it termed its basin of attraction [Hop82]. \nIn addition to the stored memories, also other, non-memory states exist as stable \nstates (local minima) of the network [AGS85]. The maximal number m of (randomly \ngenerated) memory patterns which can be stored in the basic Hopfield network of \nn neurons is m = eke \u2022 n, eke :::::: 0.14 [AGS85]. \n\n2.2 Recall and Recognition in the model's framework \n\n2.2.1 Recall \n\nRecall is considered successful when upon starting from an initial cue the network \nconverges to a stable state which corresponds to the learned memory nearest to \nthe input pattern. Inter-pattern distance is measured by the Hamming distance \nbetween the input and the learned item encodings. If the network converges to a \nnon-memory stable state, its output will stand for a 'failure of recall' response. 1. \n\nIThe question of \"How do such non-memory states bear the meaning of 'recall failure'?\" \nis out of the scope of this work. However, a possible explanation is that during the learning \nphase 'meaning' is assigned to the stored patterns via connections formed with external \npatterns, and since non-memory states lack such associations with external patterns, they \nare 'meaningless', yielding the 'recall failure' response. Another possible mechanism is that \nevery output pattern generated in the recall process passes also a recognition phase so that \nnon-memory states are rejected, (see the following paragraph describing recognition in our \nmodel). \n\n\f644 \n\nRuppin and Yeshurun \n\n2.2.2 Recognition \n\nRecognition is considered successful when the network arrives at a stable state dur(cid:173)\ning a time interval A, beginning from input presentation. In general, the shorter \nthe distance between an input and its nearest memory, the faster is its convergence \n[AM88, KP88, RY90]. Since non-memory (non-learned) stable states have higher \nenergy levels and much shallower basins of attraction than memorized stable states \n[AGS85, LN89], convergence to such states takes significantly longer timer. There(cid:173)\nfore, there exists a range of possible values of A that enable successful recognition \nonly of inputs similar to one of the stored memories. \n\n2.3 Other features of the model \n\n\u2022 The context of the psychological experiments is represented as a substring of \nthe input's encoding. In order to minimize inter-pattern correlation, the size of \nthe context encoding relative to the total size of the memory encoding is kept \nsmall . \n\n\u2022 The total associational linkage of a learned item, is modeled as an external \nfield vector E. When a learned memory pattern eJ.' is presented to the network, \nthe value of the external field vector generated is Ei = h . ~J.', where h is an \n'orientation' coefficient, expressing the association strength. \n\nAdditional features, including a modified storage equation accounting for learning \ntaking place at the test phase, and a storage decay parameter, are described in \n[RY90]. \n\n3 The Modeling of experimental data. \n\nRegarding every phenomenon discussed, a brief description of the psychological \nfindings is followed by an account of its modeling. We rely on the known results \npertaining to Hopfield models to show that qualitatively, the psychological phenom(cid:173)\nena reviewed are emergent properties of the model. When such analytical evidence \nis lacking, simulations were performed in order to account for the experimental \ndata. For a review of the psychological literature supporting the findings modeled \nsee [GS84]. \n\nThe List-Length Effect: It is known that the probability of successful Recall or \nRecognition of a particular item decreases as the length of list of learned items \nlllcreases. \nList length is expressed in memory load. Since It has been shown that the \nwidth of the memories basins of attraction monotonically decreases following \nan approximately inverse parabolic curve [Wei85], Recall performance should \ndecrease as memory load is increased. We have examined the convergence time \nof the same set of input patterns at different values of memory load. As demon(cid:173)\nstrated in Fig. 1, it was found tha.t, as the memory load is increased, successful \nconvergence has occurred (on the average) only after an increasingly growing \nnumber of asynchronous iterations. Hence, convergence takes more time and \ncan result in Recognition failure, although memories' stability is maintained \ntill the critical capacity a c is reached. \n\n\fAn Attractor Neural Network Model of Recall and Recognition \n\n645 \n\n4000.0 \n\n3000.0 \n\nen \nc \n\u00b72 2000.0 \n0 \n'-\n~ \n\n1000.0 \n\n0.0 L----'-_--'--_'----L_ '--......1.._....1....-----1_-'---1 \n\n10.0 \n\n20.0 \n\n30.0 \n\n40.0 \n\n50.0 \n\n60.0 \n\nMemory load \n\nFigurc I: Ilccogllitioll speed (No. of a.<;Ylldll\u00b7OIlOIlS iterations) a.c; a (1Il1ction of IIlcllIory \nload (No. of storcd memories). The nclwork's parameters arc n = 500, T = 0.28 \n\nThe word-frequency effect: The more frequent a word is in language, the prob(cid:173)\n\nabilit.y of recalling it increases, while the probability of recognizing it decreases. \nA word's frequency in the language is assumed to effect its retrieval through \nthe stored word's semantic relations and associa.tions [Kat85, NCBK87]. It \nis assumed, that relative to low frequency words, high frequency words have \nmore semantic relations and therefore more connections between the patterns \nrepresenting them and other patterns stored in the memory (i.e., in other net(cid:173)\nworks). This one-ta-many relationship is assumed to be reciprocal, i.e., each .. ~f \nthe externally stored patterns has also connections projected to several of th\u20ac: \nstored patterns in the allocated network. \nThe process leading to the formation of the external field E (acting upon the \nallocated network), generated by an input pattern n~arest to some stored mem(cid:173)\nory pattern {IJ is assumed to be characterized as follows: \n\n1. There is a threshold degree of overlap &ntin, such that E > 0 only when \n\nthe allocated network's state overiap H IJ is higher than Omin. \n\n2. At overlap values HIJ which are only moderately larger than Omin, hI-' is \nmonotonically increasing, but as HI-' continues to rise, a certain 'optimal' \npoint is reached, beyond which hIJ is monotonically decreasing. \n\n3. High-frequency words have lower Omin values than low-frequency words. \n\nRecognition tests are characterized by a high initial value of overlap HI-', to \nsome memory {IJ. The value of hIJ and EJ.' generated is post-optimal and \ntherefore smaller than in the case of low-frequency words which have higher \nOmin values. \nIn Recall tests the initial situation is characterized by low values of overlap HI-' \nto some nearest memory {I-'. only the overlap value of high-frequency words is \nsufficient for activating associated it.ems, i.e. HJ.' > Omin. \n\n\f646 \n\nRuppin and Yeshurun \n\nPresentation Time: Increasing the presentation time of learned words is known \n\nto improve both their Recall and Recognition. \nThis is explained by the phenomenon of maintenance rehearsal; The memories' \nbasins of attraction get deeper, since the 'energy' E of a given state equals \nto 2.:;:1 H/J 2 \u2022 Deeper basins of attraction are also wider [HFP83, KPKP90]. \nTherefore, the probability of successful Recall and Recognition of rehearsed \nitems is increased. The effect of a uniform rehearsal is equivalent to a tempera(cid:173)\nture decrease. Hence, increasing presentation time will attenuate and delay the \nList length phenomenon, till a certain limit. In a similar way, the Test Delay \nphenomenon is accounted for [RY90]. \n\nContext Shift: The term Context Shift refers to the change in context from the \ntutorial period to the test period. Studies examining the effect of context shift \nhave shown a decrement in Recall performance with context shift, but little \nchange in Recognition performance. \nAs demonstrated in [RY90], when a context shift is simulated by flipping some \nof the context string's bits, Recall performance severely deteriorates while mem(cid:173)\nories stability remains intact. No significant increase in the time (i.e. number of \nasynchronous iterations) required for convergence was found , thus maintaining \nthe pre-shift probability of successful Recognition. \n\nAge differences in Recall and Recognition: It was found that older people \nperform more poorly on Recall tasks than they do on Recognition tasks [eM87]. \nThese findings can be accounted for by the assumption that synapses are be(cid:173)\ning weakened and deleted with aging, which although being controversial has \ngained some experimental support (see [RY90]) . 'Ve have investigated the re(cid:173)\ntrieval performance as a function of the input's initial overlap, various levels \nof synaptic dilution, and memory load: As demonstrated in Fig. 2, when \nthe synaptic dilution is increased, a 'critical' phase is reached where memory \nretrieval of far-away input patterns is decreased but the retrieval of input pat(cid:173)\nterns with a high level of ihitial overlap remains intact. As the memory load \nis increased, this 'critical' phase begins at lower levels of synaptic dilution. On \nthe other hand, only a mild increase (of 15%) in recognition speed was found. \n\n?;(cid:173)'i. 1 \nu \n\u00a3J \n2 n. \nu .. a: \n'0 \n\nFigure\u00a3 Til ... I'I\"I'AJ,ilily (If I'1I(rr;s\"ful n:Lriev:ol ,~rfllrrllAtlre IlS II runcliotl or memory \nlu,ullltlclll'e illput ,,,,II'CIII'I; iuitialll\\'Crlal', lIt lwo clifTcJcuL cl ... grccs of \"YIIAplic llilu. \nliun; ~O% ill lhe right\u00b7si,lc-J figllle, AI\"I 5J';f, in lIlc Idl\u00b7sitlc<1 figure. Tlrc nclwork's \nparlllllc\\crj; IIrC \" = 500, l' = O.OS. \n\n\fAn Attractor Neural Network Model of Recall and Recognition \n\n647 \n\nThe interested reader can find a description of the modeling of additional phe(cid:173)\nnomena, including test position, word fragment completion, and distractor sim(cid:173)\nilarity, in [RY90]. \n\n4 On a quantitative test of the model. \n\n4.1 Estimating Recall performance \n\nIn a given network, with n neurons and m memories, the radius r of the basins \nof attraction of the memories decreases as the memory load parameter (a = min) \nis increased. According to [MPRV87], n, m, and r are related according to the \nexpression m = (1-2.r)l. n \n\n4 \n\nlogn' \n\nThe concept of the basins of attraction implies a non-linear probability function \nwith low probability when input vectors are further than the radius of attraction \nand high probability otherwise. The slope of this non-linearity increases as the noise \nlevel T is decreased. \nThe probability Pc that a random input vector will converge to one of the stored \nmemories can be estimated by Pc ~ l::~~ (~) . m. It is interesting to note that \nthe rates of change of r and of Pc have distinct forms; Recall tests beginning from \nrandomly generated cues would yield a very low rate of successful Recall (Pc). Yet, \nif one examines Recall by picking a stored memory, flipping some of its encoding \nbits, and presenting it as an input to the network (determining r), 'reasonable' \nlevels of successful Recall can still be obtained even when a 'considerable' number \nof encoding bits are flipped. Pc can be also estimated by considering the context \nrepresentation [RY90]. \n\n4.2 Estimating Recognition performance \n\nThe probability of correct Recognition depends mainly on the the length of the \ninterval ~; assume that after an input pattern is presented to a network of n neurons, \nduring the time interval ~, k iterations steps of a Monte Carlo simulation are \nperformed: In each such step, a neuron is randomly selected, and then it examines \nwhether or not it should flip its state, according to its input. \nWe show that the probability Pg { d} that an input pattern will be successfully \nrecognized is is bounded by Pg {d} ~ 1 - d . e -nlc \u2022 It can be seen that Recognition's \nsuccess depends strongly on the initial input proximity to a stored memory, and \neven more strongly dependent on the number of allowed asynchronous iteration k, \ndetermined by the length of~. For a selection of k = n(ln(d) + e), one obtains \nPg ~ 1-e-c . The expected number of iterations, (denoted as Exp( X\u00bb \ntill successful \nconvergence is achieved is E(X) = l:t=l E(Xd = n . l:t=l + ~ n .In(d). \nIn the more general case, Let 0 denote the Hamming distance (between the network's \nstate S and a stored memory) below which retrieval is considered successful. Then, \n(!) . e -~.o, and \nthe corrected estimations of retrieval performance are Pg ~ 1 -\nE(X) ~ n .In( ~). In simulations we have performed, (n = 500, d = 20, 0 = 10), the \n\n\f648 \n\nRuppin and Yeshurun \n\naverage number of iterations until successful convergence was in the range of 300 -\n400, in excellent correspondence with the predicted expectation, E(X) = 500\u00b7ln(2). \n\nReferences \n\n[AM88] \n\n[CM87] \n\n[GS84] \n\n[AGS85] D. J. Amit, H. Gutfreund, and H. Sompolinsky. Storing infinite numbers \nof patterns in a spin-glass model of neural networks. Phys. Rev. Lett., \n55:1530, 1985. \nS. I. Amari and K. Maginu. Statistical neurodynamics of associative \nmemory. Neural Networks, 1:63, 1988. \nF.I.M. Craik and J .M. McDowd. Age differences in recall and recog(cid:173)\nnition. Journal of Experimental Psychology; Learning, Memory, and \nCognition, 13(3):474, 1987. \nG. Gillund and M. Shiffrin. A retrieval model for both recognition and \nrecall. Psychological Review, 91:1, 1984. \nJ.J. Hopfield, D. I. Fienstien, and R. G. Palmer. Unlearning' has a \nstabilizing effect in collective memories. Nature, 304:158, 1983. \nJ.J. Hopfield. Neural networks and physical systems with emergent \ncollective abilities. Proc. Nat. Acad. Sci. USA, 79:2554, 1982. \nT. Kato. Semantic-memory sources of episodic retrieval failure. Memory \n& Cognition, 13(5):442, 1985. \nJ. Komios and R. Paturi. Convergence results in an associative memory \nmodel. Neural Networks, 1:239, 1988. \n\n[HFP83] \n\n[Hop82] \n\n[Kat85] \n\n[KP88] \n\n[KPKP90] B. Kagmar-Parsi and B. Kagmar-Parsi. On problem solving with hop(cid:173)\n\n[LN89] \n\nfield neural networks. Bioi. Cybern., 62:415, 1990. \nM. Lewenstein and A. Nowak. Fully connected neural networks with \nself-control of noise levels. Phys. Rev. Lett., 62(2):225, 1989. \n\n[MPRV87] R.J. McEliece, E.C. Posner, E.R. Rodemich, and S.S. Venkatesh. The \nIEEE Transactions on \n\ncapacity of the hopfield associative memory. \nInformation theory, IT-33( 4):461, 1987. \n\n[RY90] \n\n[NCBK87] D.L Nelson, J.J. Canas, M.T. Bajo, and P.D. Keelan. Comparing word \nfragment completion and cued recall with letter cues. Journal of Exper(cid:173)\nimental Psychology: Learning, Memory and Cognition, 13(4) :542, 1987. \nE. Ruppin and Y. Yeshurun. Recall and recognition in an attractor \nneural network model of memory retrieval. Technical report, Dept. of \nComputer Science, Tel-Aviv University, 1990. \nG. Weisbuch. Scaling laws for the attractors of hopfield networks. J. \nPhysique Lett., 46:L-623, 1985. \n\n[Wei85] \n\n\f", "award": [], "sourceid": 307, "authors": [{"given_name": "Eytan", "family_name": "Ruppin", "institution": null}, {"given_name": "Yehezkel", "family_name": "Yeshurun", "institution": null}]}