{"title": "History-Dependent Attractor Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 572, "page_last": 579, "abstract": null, "full_text": "History-dependent Attractor Neural \n\nNetworks \n\nIsaac Meilijson \n\nEytan Ruppin \n\nSchool of Mathematical Sciences \n\nRaymond and Beverly Sackler Faculty of Exact Sciences \n\nTel-A viv University, 69978 Tel-Aviv, Israel. \n\nAbstract \n\nWe present a methodological framework enabling a detailed de(cid:173)\nscription of the performance of Hopfield-like attractor neural net(cid:173)\nworks (ANN) in the first two iterations. Using the Bayesian ap(cid:173)\nproach, we find that performance is improved when a history-based \nterm is included in the neuron's dynamics. A further enhancement \nof the network's performance is achieved by judiciously choosing \nthe censored neurons (those which become active in a given itera(cid:173)\ntion) on the basis of the magnitude of their post-synaptic poten(cid:173)\ntials. The contribution of biologically plausible, censored, history(cid:173)\ndependent dynamics is especially marked in conditions of low firing \nactivity and sparse connectivity, two important characteristics of \nthe mammalian cortex. \nIn such networks, the performance at(cid:173)\ntained is higher than the performance of two 'independent' iter(cid:173)\nations, which represents an upper bound on the performance of \nhistory-independent networks. \n\n1 \n\nIntroduction \n\nAssociative Attractor Neural Network (ANN) models provide a theoretical back(cid:173)\nground for the understanding of human memory processes. Considerable effort has \nbeen devoted recently to narrow the gap between the original ANN Hopfield model \n(Hopfield 1982) and the realm of the structure and dynamics of the brain (e.g., \nAmit & Tsodyks 1991). In this paper, we contribute to the examination of the \nperformance of ANNs under cortical-like architectures, where neurons are typically \n\n572 \n\n\fHistory-dependent Attractor Neural Networks \n\n573 \n\nconnected to only a fraction of their neighboring neurons, and have a low firing \nactivity (Abeles et. al. 1990). We develop a general framework for examining var(cid:173)\nious signalling mechanisms (firing functions) and activation rules (the mechanism \nfor deciding which neurons are active in some interval of time). \n\nThe Hopfield model is based on memoryless dynamics, which identify the notion of \n'post-synaptic potential' with the input field received by a neuron from the neurons \nactive in the current iteration. We follow a Bayesian approach under which the \nneuron's signalling and activation decisions are based on the current a-posteriori \nprobabilities assigned to its two possible true memory states, \u00b11. As we shall \nsee, the a-posteriori belief in +1 is the sigmoidal function evaluated at a neuron's \ngeneralized field, a linear combination of present and past input fields. From a \nbiological perspective, this history-dependent approach is strongly motivated by \nthe observation that the time span of the different channel conductances in a given \nneuron is very broad (see Lytton 1991 for a review). While some channels are active \nfor only microseconds, some slow-acting channels may remain open for seconds. \nHence, a synaptic input currently impending on the neuron may influence both its \ncurrent post-synaptic membrane potential, and its post-synaptic potential at some \nfuture time. \n\n2 The Model \n\nThe neural network model presented is characterized as follows. There are m 'ran(cid:173)\ndom memories' elJ , 1 < iJ ::; m, and one 'true' memoryem +1 = e. The (m + 1)N \nentries of these memories are independent and identically distributed, with equally \nlikely values of +1 or --1. The initial state X has similarityP(Xi = ei) = (1+\u20ac)/2, \nP(Xi = -ed = (1 -\nindependently of everything else. The weight of the \nsynaptic connection between neurons i and j (i -j:. j) is given by the simple Hebbian \nlaw \n\n\u20ac)/2, \n\nm+l \n\nWij* = L elJielJ j \n\n(1) \n\n1J=1 \n\nEach neuron receives incoming synaptic connections from a random choice of K of \nthe N neurons in the network in such a way that if a synapse exists, the synapse \nin the opposite direction exists with probability r, the reflexivity parameter. In the \nfirst iteration, a random sample of Ll neurons become active (i.e., 'fire'), thus on \nthe average nl = LlK/N neurons update the state of each neuron. The field //1) \nof neuron i in the first iteration is \n\nN \n\nj .(l) - ~ ~ w. .. * [..I\u00b7(l)X\u00b7 \n\n- ~ IJ \n\nIJ 1 \n\nI \n\nJ ' \n\n(2) \n\nnl j=1 \n\nwhere Iij denotes the indicator function of the event 'neuron i receives a synaptic \nconnection from neuron j', and I/ t ) denotes the indicator function of the event \n'neuron j is active in the t'th iteration'. Under the Bayesian approach we adopt, \nneuron i assigns an a-priori probability ,\\/0) = p(ei = +1\\Xi) = (1 + {Xi)/2 to \nhaving +1 as the correct memory state and evaluates the corresponding a-posteriori \nprobability ,\\/1) = p(ei = +1\\Xi , fi(I), which turns out to be expressible as the \n\n\f574 \n\nMeilijson and Ruppin \n\nsigmoidal function 1/( 1 + exp( -2x)) evaluated at some linear combination of Xi \nand fi(I). \n\nIn the second iteration the belief A/I) of a neuron determines the probability that \nthe neuron is active. We illustrate two extreme modes for determining the active \nthe random case where L2 active neurons a.re \nupdating neurons, or activation: \nrandomly chosen, independently of the strength of their fields, and the censored \ncase, which consists of selecting the L2 neurons whose belief belongs to some set. \nThe most appealing censoring rule from the biological point of view is tail-censoring, \nwhere the active neurons are those with the strongest beliefs. Performance, however, \nis improved under interval-censoring, where the active neurons are those with mid(cid:173)\nrange beliefs, and even further by combining tail and interval censoring into a hybrid \nrule. \nLet n2 = L2 f{ / N . The activation rule is given by a function C : [~, 1] -+ [0, 1] . \nNeuron j, with belief A/I) in +1, becomes active with probability C(maxp/I), 1-\nAj (1\u00bb)), independently of everything else. For example, the random case corresponds \n\nto C = In and the tail-censored case corresponds to C(A) = 1 or 0 depending on \n\nwhether max(A, 1 - A) exceeds some threshold. The output of an active neuron j \nis a signal function S(A/ I \u00bb) of its current belief. The field f/ 2 ) of neuron i in the \nsecond iteration is \n\nf \u00b7(2) - ~ \"\" w. .. * /- \u00b7I \u00b7(2)S(A \u00b7(1\u00bb) \n, \n\nI } } \n\nI} \n\n-\n\nN \nL...J \nn2 j=1 \n\n} \n\n\u2022 \n\n(3) \n\nNeuron i now evaluates its a-posteriori belief \nAi(2) = p(ei = +1IXi,Ii(l),f/I\\I/2\u00bb). As we shall see, Ai(2) is, again, the \nsigmoidal function evaluated at some linear combination of the neuron 's history \nXi, XJi(I), f/ I ) and 1/2). In contrast to the common history-independent Hopfield \ndynamics where the signal emitted by neuron j in the t'th iteration is a function \nof /jet-I) only, Bayesian history-dependent dynamics involve signals and activation \nrules which depend on the neuron's generalized field, obtained by adaptively incor(cid:173)\nporating /j(t-I) to its previous generalized field. The final state X/ 2 ) of neuron i \nis taken as -lor +1, depending on which of 1 - A/ 2) and A/2 ) exceeds 1/2. \nFor nI/N, ndN, m/N, K/N constant, and N large, we develop explicit expressions \nfor the performance of the network, for any signal function (e.g., SI(A) = Sgn(A-\n1/2) or S2(A) = 2A - 1) and activation rule. Performance is measured by the final \noverlap e\" = ~ L eiX/2) (or equivalently by the final similarity (1+e\")/2). Various \npossible combina.tions of activation modes and signal functions described above are \nthen examined under varying degrees of connectivity and neuronal activity. \n\n3 Single-iteration optimization: the Bayesian approach \n\nConsider the following well known basic fact in Bayesian Hypothesis Testing, \nLemma 1 \nExpress the prior probability as \n\npee = 1) = ---:--\n1 + e- 2x \n\n1 \n\n(4) \n\n\fHistory-dependent Attractor Neural Networks \n\n575 \n\nand assume an observable Y which, given e, is distributed according to \n\nYle\", N(lle, (12) \n\n(5) \n\n~or some constants Il E (-00,00) and (12 E (0,00). Then the posterior probability \nIS \n\nP(C = llY = y) = \u00ab ) ). \n\n1 + e-2 x+ 1-'/0'2 y \n\nI, \n\n1 \n\nApplying this Lemma to Y = fi(I), with Il = e and (12 = .ill. = al, we see that \n\nnl \n\n(6) \n\n(7) \n\n~i(1) = p(ei = 11Xi , fi(I)) = __ ---:-_1_--:-:---:-\n1 + e- 2f (\"Y(f)X,+f,(1)/0:'1) \n\n, \n\nwhere i( f) = if log ~:!:~. Hence, pee = 11Xi , f/ 1)) > 1/2 if and only if 1/1) + \nan( f)Xi > O. The single-iteration performance is then given by the similarity \n\n(8) \n\n1 ;f~ (;.\" +1(f)fo1) + 1; '~ (;.\" -1(f)fo1) \n\n= Q(e, at) \n\nwhere * is the standard normal distribution function. The Hopfield dynamics, mod(cid:173)\nified by redefining Wii as mi(e) (in the Neural Network terminology) is equivalent \n(in the Bayesia.n jargon) to the obvious optimal policy, under which a neuron sets \nfor itself the sign with posterior probability above 1/2 of being correct. \n\n4 Two-iterations optiInization \n\nFor mathematical convenience, we will relate signals and activation rules to nor(cid:173)\nmalized generalized fields rather than to beliefs. We let \n\nh x - S \n\n( ) -\n\n(1 + e- 2CX ) , p( ) -\n\nx - C max \n\n(( 1 \n1 + e- 2cx ' \n\n1 -\n\n1)) \n\n1 + e- 2cx \n\n(9) \n\n1 \n\nfor C = f/ -Jal. The signal function h is assumed to be odd, and the activation \nfunction p, even . \n\nIn order to evaluate the belief ~/2), we need the conditional distribution of li(2) \ngiven Xi, 1/1) and 1/1), for ei = -1 or ei = + 1. We adopt the working a.ssumption \nthat the pair of random variables (Ii (1), h(2)) has a bivariate normal distribution \ngiven ei, 1/1) and Xi, with ei, 1/1) and Xi affecting means but not va.riances or \ncorrelations. Under this working assumption, fi(2) is conditionally normal given \n(ei,1/ 1),Xi,I/ 1)), with constant variance and a mean which we will identify. This \nworking assumption allows us to model performance via the following well known \nregression model. \n\n\f576 \n\nMeilijson and Ruppin \n\nLemma 2 \nIf two random variables U and V with finite variances are such that E(VIU) is a \nlinear function of U and Var(VIU) is constant, then \n\nE(VIU) = E(V) + Cov(U, V) (U - E(U\u00bb \n\nVar(U) \n\n(10) \n\nand \n\nLetting U = fi(l) and V = f/ 2), we obtain \n\nA/ 2) = pee; = llXi, h(l>, li(l), 1/2\u00bb = \n\n(12) \n\n1 \n\n1 + exp{ -2 [E (// 1) la1 + {(E)Xi ) + f*;;af (fi(2) - bXa/1) - afi (1\u00bb)]} -\n\n1 + exp{ -2 [ (q( E) -\n\n1 \n\nb(f*T;af) I/ 1\u00bb) Xi + (afl - a(f:;af\u00bb) li(l) + f* ;;af 1/2 )] } \n\nwhich is the sigmoidal function evaluated at some generalized field. Expression (12) \nshows that the correct definition of a final state Xi (2), as the most likely value among \n+1 or -1, is \n\nX .(2) _ S \n\n, \n\n-\n\ngn \n\nE{ f \n\n[( () _ b(E\" - af)I.Cl\u00bb) \n\n2 \n\nT \n\nI \n\n. (~_ a(E\" - aE\u00bb) J.\".(1) \n\n2 \n\nh \n\nX, + \n\nal \n\nT \n\nE\" - aE f .(2)j \n\nI \n\n2 \n\n+ \n\nT \n\n(13) \n\nand the performance is given by \n\np(X/2) = eilei) = 1; E4> (R + {(E)N) + 1 ~ E4> (R -{(E)~) = \n\nwhere the one-iteration performance function Q is defined by (8), and \n\nQ( E, a\") \n\n\" m \na =-\nn\" \n\nm \n\n(14) \n\n(15) \n\nWe see that the performance is conveniently expressed as the single-iteration optimal \nperformance, had this iteration involved n\" rather than nl sampled neurons. This \nformula yields a numerical and analytical tool to assess the network 's performance \nwith different signal functions, activation rules and architectures. Due to space \nrestrictions, the identification of the various parameters used in the above formulas \nis not presented. However, it can be shown that in the sparse limit arrived at \nby fixing al and a2 and letting both J{ 1m and N I J{ go to infinity, it is always \nbetter to replace an iteration by two smaller ones. This suggests that Bayesian \n\n\fupdating dynamics should be essentially asynchronous. We also show that the \n\nHistory-dependent Attractor Neural Networks \n\n577 \n\ntwo-iterations performance Q (c, -L+-L(~)2) is superior to the performance \nQ (2Q( c, al) - 1, (2) of two independent optimal single iterations. \n\nCIt 1 \n\ncw2 \n\n~ \n\n5 Heuristics on activation and signalling \n\nU \n\n1 \n\n-1 \n\n-4 \n\nFigure 1: A typical plot of R(x) = \u00a2l(X)/\u00a2O(x). Network parameters are N = 500, \nK = 500, n1 = n2 = 50 and m = 10. \nBy (14) and (15), performance is mostly determined by the magnitude of (col< - ac)2. \nIt can be shown that \n\nand \n\n~a = 100 p(x)\u00a2o(x)dx \n\n(16) \n\n(17) \n\nwhere \u00a2l and \u00a2o are some specific linear combinations of Gaussian densities and \ntheir derivatives , and ~ a = n2/ K is the activity level. High performance is achieved \nby maximizing over p and possibly over h the absolute value of expression (16) \n~eeping (17) fixed. In complete analogy to Hypothesis ~esting in Statistics, where \nWa takes the role of level of significance and (c\u00b7 - ac)wa the role of power, p(x) \nshould be 1 or a (activate the neuron or don't) depending on whether the field \nvalue x is such that the likelihood ratio h(X)\u00a2l(X)/\u00a2O(x) is above or below a given \nthreshold, determined by (17). Omitting details, the ratio R(x) = \u00a2l(X)/\u00a2O(x) \nlooks as in figure 1, and converges to -00 as x --+ 00 . \nWe see that there are three reasonable ways to make the ratio h(X)\u00a2l(X)/\u00a2O(x) \nlarge: we can take a negative threshold such as t1 in figure 1, activate all neurons \nwith generalized field exceeding /33 (tail-censoring) and signal hex) = -Sgn(x), \n\n\f578 \n\nMeilijson and Ruppin \n\nor take a positive threshold such as t2 and activate all neurons with field value \nbetween /31 and /32 (interval-censoring) and signal h(x) = Sgn(x). Better still, we \ncan consider the hybrid signalling-censoring rule: Activate all neurons with absolute \nfield value between /31 and /32, or beyond /33' The first group should signal their \npreferred sign, while those in the second group should signal the sign opposite to \nthe one they so strongly believe in ! \n\n6 Numerical results \n\nPerformance \nRandom activation \nTail censoring \nIntervalj'Hybrid censoring \nHopfield - zero diagonal \nIndependent ,( () diagonal 0.96 \nIndependent zero diagonal 0.913 \n\npredicted \n0.955 \n0.972 \n0.975 \n-\n\nexperimental \n0.951 \n0.973 \n0.972 \n0.902,0.973 \n-\n-\n\nTable 1: Sparsely connected, low activity network: N = 1500, J{ = 50, nl = n2 = \n20,m = 5. \n\n\u2022 I i ' i \n\n'.-1 :)..~ \n\n.. / \n. \u00b7\u00b7f \n. ... : ...... i \n\nI \n\n.~. \n\n, \n\ni ..... ~ ...... . \n\n0.95 ~ .... \n\n',' \n\n~. \n\n. ........................... . \n\n. \" ' .. \n\n' \n\n. ...... ?( ....... . \n\n-'-... \n\n0.85 \n\n..... \n\n. ' . \n\n- - First Iteration \n\n.... Random Activation \n\n> \u2022\n\n\u2022\n\n\u2022\u2022 \n\n- Tail-censoring \n\n.... . - .. Interval-censoring \n....... : Hybrid censoring-signalling \n\n0.80 L-~_--'-_~_'--~_-'-_-'----.J'--~_--' \n\n20000.0 \n\n40000.0 \n\n0.0 \n\n60000.0 \n\n80000.0 \n\n100000.0 \n\nFigure 2: Performance of a large-scale cortical-like 'columnar' ANN, at different \nvalues of connectivity J{, for initial similarity 0.75. N = 105 , nl = n2 = 200, \nm = 50. The horizontal line denotes the performance of a single iteration. \n\nK \n\n\fHistory-dependent Attractor Neural Networks \n\n579 \n\nOur theoretical performance predictions show good correspondence with simula(cid:173)\ntion results, already at fairly small-scale networks. The superiority of history(cid:173)\ndependent dynamics is apparent. Table 1 shows the performance achieved in a \nsparsely-connected network. The predicted similarity after two iterations is re(cid:173)\nported, starting from initial similarity 0.75, and compared with experimental results \naveraged over 100 trials. \n\nFigure 2 illustrates the theoretical two-iterations performance of large, low-activity \n'cortical-like' networks, as a function of connectivity. We see that interval-censoring \ncan maintain high performance throughout the connectivity range. The perfor(cid:173)\nmance of tail-censoring is very sensitive to connectivity, almost achieving the per(cid:173)\nformance of interval censoring at a narrow low-connectivity range, and becoming \noptimal only at very high connectivity. The superior hybrid rule improves on the \nothers only under high connectivity. As a cortical neuron should receive the con(cid:173)\ncomitant firing of about 200 - 300 neurons in order to be activated (Treves & Rolls \n1991), we have set n = 200. We find that the optimal connectivity per neuron, for \nbiologically plausible tail-censoring activation, is of the same order of magnitude as \nactual cortical connectivity. The actual number nN / K of neurons firing in every \niteration is about 5000, which is in close correspondence with the evidence suggest(cid:173)\ning that about 4% of the neurons in a module fire at any given moment (Abeles et. \na!. 1990). \n\nReferences \n\n[1] J.J. Hopfield. Neural networks and physical systems with emergent collective \n\nabilities. Proc. Nat. Acad. Sci. USA, 79:2554,1982. \n\n[2] D. J. Amit and M. V. Tsodyks. Quantitative study of attractor neural net(cid:173)\n\nwork retrieving at low spike rates: I. substrate-spikes, rates and neuronal gain. \nNetwork, 2:259-273, 1991. \n\n[3] M. Abeles, E. Vaadia, and H. Bergman. Firing patterns of single units in the \n\nprefrontal cortex and neural network models. Network, 1:13-25, 1990. \n\n[4] W. Lytton. Simulations of cortical pyramidal neurons synchronized by inhibitory \n\ninterneurons. J. Neurophysiol., 66(3):1059-1079, 1991. \n\n[5] A. Treves and E. T. Rolls. What determines the capacity of autoassociative \n\nmemories in the brain? Network, 2:371-397, 1991. \n\n\f", "award": [], "sourceid": 627, "authors": [{"given_name": "Isaac", "family_name": "Meilijson", "institution": null}, {"given_name": "Eytan", "family_name": "Ruppin", "institution": null}]}*