{"title": "Capacity of strong attractor patterns to model behavioural and cognitive prototypes", "book": "Advances in Neural Information Processing Systems", "page_first": 2661, "page_last": 2669, "abstract": "We solve the mean field equations for a stochastic Hopfield network with temperature (noise) in the presence of strong, i.e., multiply stored patterns, and use this solution to obtain the storage capacity of such a network. Our result provides for the first time a rigorous solution of the mean field equations for the standard Hopfield model and is in contrast to the mathematically unjustifiable replica technique that has been hitherto used for this derivation. We show that the critical temperature for stability of a strong pattern is equal to its degree or multiplicity, when sum of the cubes of degrees of all stored patterns is negligible compared to the network size. In the case of a single strong pattern in the presence of simple patterns, when the ratio of the number of all stored patterns and the network size is a positive constant, we obtain the distribution of the overlaps of the patterns with the mean field and deduce that the storage capacity for retrieving a strong pattern exceeds that for retrieving a simple pattern by a multiplicative factor equal to the square of the degree of the strong pattern. This square law property provides justification for using strong patterns to model attachment types and behavioural prototypes in psychology and psychotherapy.", "full_text": "Capacity of strong attractor patterns to model\n\nbehavioural and cognitive prototypes\n\nAbbas Edalat\n\nDepartment of Computing\nImperial College London\nLondon SW72RH, UK\n\nae@ic.ac.uk\n\nAbstract\n\nWe solve the mean \ufb01eld equations for a stochastic Hop\ufb01eld network with tem-\nperature (noise) in the presence of strong, i.e., multiply stored, patterns, and use\nthis solution to obtain the storage capacity of such a network. Our result provides\nfor the \ufb01rst time a rigorous solution of the mean \ufb01led equations for the standard\nHop\ufb01eld model and is in contrast to the mathematically unjusti\ufb01able replica tech-\nnique that has been used hitherto for this derivation. We show that the critical\ntemperature for stability of a strong pattern is equal to its degree or multiplicity,\nwhen the sum of the squares of degrees of the patterns is negligible compared\nto the network size. In the case of a single strong pattern, when the ratio of the\nnumber of all stored pattens and the network size is a positive constant, we obtain\nthe distribution of the overlaps of the patterns with the mean \ufb01eld and deduce that\nthe storage capacity for retrieving a strong pattern exceeds that for retrieving a\nsimple pattern by a multiplicative factor equal to the square of the degree of the\nstrong pattern. This square law property provides justi\ufb01cation for using strong\npatterns to model attachment types and behavioural prototypes in psychology and\npsychotherapy.\n\n1\n\nIntroduction: Multiply learned patterns in Hop\ufb01eld networks\n\nThe Hop\ufb01eld network as a model of associative memory and unsupervised learning was introduced\nin [23] and has been intensively studied from a wide range of viewpoints in the past thirty years.\nHowever, properties of a strong pattern, as a pattern that has been multiply stored or learned in\nthese networks, have only been examined very recently, a surprising delay given that repetition of an\nactivity is the basis of learning by the Hebbian rule and long term potentiation. In particular, while\nthe storage capacity of a Hop\ufb01eld network with certain correlated patterns has been tackled [13, 25],\nthe storage capacity of a Hop\ufb01eld network in the presence of strong as well as random patterns has\nnot been hitherto addressed.\nThe notion of a strong pattern of a Hop\ufb01eld network has been proposed in [15] to model attachment\ntypes and behavioural prototypes in developmental psychology and psychotherapy. This sugges-\ntion has been motivated by reviewing the pioneering work of Bowlby [9] in attachment theory and\nhighlighting how a number of academic biologists, psychiatrists, psychologists, sociologists and\nneuroscientists have consistently regarded Hop\ufb01eld-like arti\ufb01cial neural networks as suitable tools\nto model cognitive and behavioural constructs as patterns that are deeply and repeatedly learned by\nindividuals [11, 22, 24, 30, 29, 10].\nA number of mathematical properties of strong patterns in Hop\ufb01eld networks, which give rise to\nstrong attractors, have been derived in [15]. These show in particular that strong attractors are\nstrongly stable; a series of experiments have also been carried out which con\ufb01rm the mathematical\n\n1\n\n\fresults and also indicate that a strong pattern stored in the network can be retrieved even in the pres-\nence of a large number of simple patterns, far exceeding the well-known maximum load parameter\nor storage capacity of the Hop\ufb01eld network with random patterns (\u03b1c \u2248 0.138).\nIn this paper, we consider strong patterns in stochastic Hop\ufb01eld model with temperature, which ac-\ncounts for various types of noise in the network. In these networks, the updating rule is probabilistic\nand depend on the temperature. Since analytical solution of such a system is not possible in general,\none strives to obtain the average behaviour of the network when the input to each node, the so-called\n\ufb01eld at the node, is replaced with its mean. This is the basis of mean \ufb01eld theory for these networks.\nDue to the close connection between the Hop\ufb01eld network and the Ising model in ferromagnetism [1,\n8], the mean \ufb01eld approach for the Hop\ufb01eld network and its variations has been tackled using the\nreplica method, starting with the pioneering work of Amit, Gutfreund and Sompolinsky [3, 2, 4, 19,\n31, 1, 13]. Although this method has been widely used in the theory of spin glasses in statistical\nphysics [26, 16] its mathematical justi\ufb01cation has proved to be elusive as we will discuss in the next\nsection; see for example [20, page 264], [14, page 27], and [7, page 9].\nIn [17] and independently in [27], an alternative technique to the replica method for solving the\nmean \ufb01eld equations has been proposed which is reproduced and characterised as heuristic in [20,\nsection 2.5] since it relies on a number of assumptions that are not later justi\ufb01ed and uses a number\nof mathematical steps that are not validated.\nHere, we use the basic idea of the above heuristic to develop a veri\ufb01able mathematical framework\nwith provable results grounded on elements of probability theory, with which we assume the reader\nis familiar. This technique allows us to solve the mean \ufb01eld equations for the Hop\ufb01eld network in\nthe presence of strong patterns and use the results to study, \ufb01rst, the stability of these patterns in the\npresence of temperature (noise) and, second, the storage capacity of the network with a single strong\npattern at temperature zero.\nWe show that the critical temperature for the stability of a strong pattern is equal to its degree (i.e.,\nits multiplicity) when the ratio of the sum of the squares of degrees of the patterns to the network\nsize tends to zero when the latter tends to in\ufb01nity. In the case that there is only one strong pattern\npresent with its degree small compared to the number of patterns and the latter is a \ufb01xed multiple of\nthe number of nodes, we \ufb01nd the distribution of the overlap of the mean \ufb01eld and the patterns when\nthe strong pattern is being retrieved. We use these distributions to prove that the storage capacity\nfor retrieving a strong pattern exceeds that for a simple pattern by a multiplicative factor equal to\nthe square of the degree of the strong attractor. This result matches the \ufb01nding in [15] regarding the\ncapacity of a network to recall strong patterns as mentioned above. Our results therefore show that\nstrong patterns are robust and persistent in the network memory as attachment types and behavioural\nprototypes are in the human memory system.\nIn this paper, we will several times use Lyapunov\u2019s theorem in probability which provides a simple\nsuf\ufb01cient condition to generalise the Central Limit theorem when we deal with independent but\nnot necessarily identically distributed random variables. We require a general form of this theorem\ni=1 Yni, for n \u2208 IN, be a triangular array of random variables such\nthat for each n, the random variables Yni, for 1 \u2264 i \u2264 kn are independent with E(Yni) = 0\nand E(Y 2\nni, where E(X) stands for the expected value of the random variable X. Let\nni. We use the notation X \u223c Y when the two random variables X and Y have the\ns2\nsame distribution (for large n if either or both of them depend on n).\n\nas follows. Let Yn = (cid:80)kn\nn = (cid:80)kn\n\nni) = \u03c32\ni=1 \u03c32\n\nTheorem 1.1 (Lyapunov\u2019s theorem [6, page 368]) If for some \u03b4 > 0, we have the condition:\n\nE(|Yn|2+\u03b4|) \u2192 0\n\nas n \u2192 \u221e\n\n1\n\ns2+\u03b4\nn\n\nYn\n\nd\u2212\u2192 denotes convergence in distribution, and we denote\nthen 1\nsn\nby N (a, \u03c32) the normal distribution with mean a and variance \u03c32. Thus, for large n we have\nYn \u223c N (0, s2\n\nd\u2212\u2192 N (0, 1) as n \u2192 \u221e where\nn). (cid:3)\n\n2\n\n\f2 Mean \ufb01eld theory\nWe consider a Hop\ufb01eld network with N neurons i = 1, . . . , N with values Si = \u00b11 and follow the\nnotations in [20]. As in [15], we assume patterns can be multiply stored and the degree of a pattern\nis de\ufb01ned as its multiplicity. The total number of patterns, counting their multiplicity, is denoted by\np and we assume there are n patterns \u03be1, . . . , \u03ben with degrees d1, . . . , dn \u2265 1 respectively and that\nk=1 dk \u2265 0 patterns are simple, i.e., each has degree one. Note that by our\n\nthe remaining p \u2212(cid:80)n\n\nassumptions there are precisely\n\ndistinct patterns, which we assume are independent and identically distributed with equal probability\nof taking value \u00b11 for each node. More generally, for any non-negative integer k \u2208 IN, we let\n\ndk\n\np0 = p + n \u2212 n(cid:88)\np0(cid:88)\n\nk=1\n\npk =\n\ndk\n\u00b5.\n\n\u00b5=1\n\n(cid:80)p0\n\nN(cid:88)\n\nj=1\n\nj for i (cid:54)= j\nWe use the generalized Hebbian rule for the synaptic couplings: wij = 1\nwith wii = 0 for 1 \u2264 i, j \u2264 N. As in the standard stochastic Hop\ufb01eld model [20], we use Glauber\nN\ndynamics [18] for the stochastic updating rule with pseudo-temperature T > 0, which accounts for\nvarious types of noise in the network, and assume zero bias in the local \ufb01eld. Putting \u03b2 = 1/T\n(i.e., with the Boltzmann constant kB = 1) and letting f\u03b2(h) = 1/(1 + exp(\u22122\u03b2h)), the stochastic\nupdating rule at time t is given by:\n\n\u00b5=1 d\u00b5\u03be\u00b5\n\ni \u03be\u00b5\n\nPr(Si(t + 1) = \u00b11) = f\u03b2(\u00b1hi(t)), where hi(t) =\n\nwijSj(t),\n\n(1)\n\nis the local \ufb01eld at i at time t. The updating is implemented asynchronously in a random way.\nThe energy of the network in the con\ufb01guration S = (Si)N\n\ni=1 is given by\n\nN(cid:88)\n\ni,j=1\n\nH(S) = \u2212 1\n2\n\nSiSjwij.\n\nFor large N, this speci\ufb01es a complex system, with an underlying state space of dimension 2N , which\nin general cannot be solved exactly. However, mean \ufb01eld theory has proved very useful in studying\nHop\ufb01eld networks. The average updated value of Si(t + 1) in Equation (1) is\n\n(cid:104)Si(t + 1)(cid:105) = 1/(1 + e\u22122\u03b2hi(t)) \u2212 1/(1 + e2\u03b2hi(t)) = tanh(\u03b2hi(t)),\n\n(2)\nwhere (cid:104). . .(cid:105) denotes taking average with respect to the probability distribution in the updating rule\nin Equation (1). The stationary solution for the mean \ufb01eld thus satis\ufb01es:\n\n(cid:104)Si(cid:105) = (cid:104)tanh(\u03b2hi)(cid:105),\n\nThe average overlap of pattern \u03be\u00b5 with the mean \ufb01eld at the nodes of the network is given by:\n\nN(cid:88)\n\ni=1\n\nm\u03bd =\n\n1\nN\n\ni (cid:104)Si(cid:105)\n\u03be\u03bd\n\n(3)\n\n(4)\n\nThe replica technique for solving the mean \ufb01eld problem, used in the case p/N = \u03b1 > 0 as N \u2192 \u221e,\nseeks to obtain the average of the overlaps in Equation (4) by evaluating the partition function of the\nsystem, namely,\n\nZ = TrS exp(\u2212\u03b2H(S)),\n\nwhere the trace TrS stands for taking sum over all possible con\ufb01gurations S = (Si)N\ni=1. As it\nis generally the case in statistical physics, once the partition function of the system is obtained,\n\n3\n\n\fall required physical quantities can in principle be computed. However, in this case, the partition\nfunction is very dif\ufb01cult to compute since it entails computing the average (cid:104)(cid:104)log Z(cid:105)(cid:105) of log Z, where\n(cid:104)(cid:104). . .(cid:105)(cid:105) indicates averaging over the random distribution of the stored patterns \u03be\u00b5. To overcome this\nproblem, the identity\n\nZ k \u2212 1\n\nlog Z = lim\nk\u21920\n\nk\n\nis used to reduce the problem to \ufb01nding the average (cid:104)(cid:104)Z k(cid:105)(cid:105) of Z k, which is then computed for\npositive integer values of k. For such k, we have:\n\nZ k = TrS1TrS2 . . . TrSk exp(\u2212\u03b2(H(S1) + H(S1) + . . . + H(Sk))),\n\nwhere for each i = 1, . . . , k the super-scripted con\ufb01guration Si is a replica of the con\ufb01guration\nstate. In computing the trace over each replica, various parameters are obtained and the replica\nsymmetry condition assumes that these parameters are independent of the particular replica under\nconsideration. Apart from this assumption, there are two basic mathematical problems with the tech-\nnique which makes it unjusti\ufb01able [20, page 264]. Firstly, the positive integer k above is eventually\ntreated as a real number near zero without any mathematical justi\ufb01cation. Secondly, the order of\ntaking limits, in particular the order of taking the two limits k \u2192 0 and N \u2192 \u221e, are several times\ninterchanged again without any mathematical justi\ufb01cation.\nHere, we develop a mathematically rigorous method for solving the mean \ufb01eld problem, i.e., com-\nputing the average of the overlaps in Equation (4) in the case of p/N = \u03b1 > 0 as N \u2192 \u221e. Our\nmethod turns the basic idea of the heuristic presented in [17] and reproduced in [20] for solving\nthe mean \ufb01eld equation into a mathematically veri\ufb01able formalism, which for the standard Hop\ufb01eld\nnetwork with random stored patterns gives the same result as the replica method, assuming replica\nsymmetry. In the presence of strong patterns we obtain a set of new results as explained in the next\ntwo sections.\nThe mean \ufb01eld equation is obtained from Equation (3) by approximating the right hand side of\nj=1 wij(cid:104)Sj(cid:105), ignoring the sum\n\nthis equation by the value of tanh at the mean \ufb01eld (cid:104)hi(cid:105) = (cid:80)N\n(cid:80)N\nj=1 wij(Sj \u2212 (cid:104)Sj(cid:105)) for large N [17, page 32]:\n(cid:16) \u03b2\n\n(cid:80)p0\n\n(cid:80)N\n\nN\n\nj=1\n\n\u03be\u00b5 (1 \u2264 \u00b5 \u2264 n) and p \u2212(cid:80)n\n\ufb01rst case we assume p2 :=(cid:80)p0\n\nEquation (5) gives the mean \ufb01eld equation for the Hop\ufb01eld network with n possible strong patterns\n\u00b5=1 d\u00b5 simple patterns \u03be\u00b5 with n + 1 \u2264 \u00b5 \u2264 p0. As in the standard\nHop\ufb01eld model, where all patterns are simple, we have two cases to deal with. However, we now\nhave to account for the presence of strong attractors and our two cases will be as follows: (i) In the\n\u00b5 = o(N ), which includes the simpler case p2 (cid:28) N when p2\nis \ufb01xed and independent of N. (ii) In the second case we assume we have a single strong attractor\nwith the load parameter p/N = \u03b1 > 0.\n\n\u00b5=1 d2\n\n(cid:104)Si(cid:105) = tanh(\u03b2(cid:104)hi(cid:105)) = tanh\n\nj (cid:104)Sj(cid:105)(cid:17)\n\n.\n\n(5)\n\n\u00b5=1 d\u00b5\u03be\u00b5\n\ni \u03be\u00b5\n\n3 Stability of strong patterns with noise: p2 = o(N )\nThe case of constant p and N \u2192 \u221e is usually referred to as \u03b1 = 0 in the standard Hop\ufb01eld\nmodel. Here, we need to consider the sum of degrees of all stored patterns (and not just the number\nof patterns) compared to N. We solve the mean \ufb01eld equation with T > 0 by using a method\nsimilar in spirit to [20, page 33] for the standard Hop\ufb01eld model, but in our case strong patterns\ninduce a sequence of independent but non-identically distributed random variables in the crosstalk\nterm, where the Central Limit Theorem cannot be used; we show however that Lyapunov\u2019s theorem\n(Theorem (1.1) can be invoked. In retrieving pattern \u03be1, we look for a solution of the mean \ufb01led\nequation of the form: (cid:104)Si(cid:105) = m\u03be1\ni , where m > 0 is a constant. Using Equation (5) and separating\nthe contribution of \u03be1 in the argument of tanh, we obtain:\n\n\uf8eb\uf8ed m\u03b2\n\nN\n\n\uf8eb\uf8edd1\u03be1\n\ni +\n\n(cid:88)\n\nj(cid:54)=i,\u00b5>1\n\nm\u03be1\n\ni = tanh\n\n4\n\n\uf8f6\uf8f8\uf8f6\uf8f8 .\n\nd\u00b5\u03be\u00b5\n\ni \u03be\u00b5\n\nj \u03be1\nj\n\n(6)\n\n\fFor each N, \u00b5 > 1 and j (cid:54)= i, let\n\nYN \u00b5j =\n\nd\u00b5\nN\n\nThis gives (p0 \u2212 1)(N \u2212 1) independent random variables with E(YN \u00b5j) = 0, E(Y 2\nand E(|Y 3\n\n\u00b5/N 3. We have:\n\nN \u00b5j|) = d3\n\nN \u00b5j) = d2\n\n\u00b5/N 2,\n\ni \u03be\u00b5\n\u03be\u00b5\n\nj \u03be1\nj .\n\n(7)\n\nE(Y 2\n\nN \u00b5j) =\n\ns2\nN :=\n\n\u00b5>1,j(cid:54)=i\n\nN \u2212 1\nN 2\n\n(cid:88)\n(cid:88)\n\nThus, as N \u2192 \u221e, we have:\n\n(cid:88)\n(cid:80)\nN ((cid:80)\nas N \u2192 \u221e since for positive numbers d\u00b5 we always have(cid:80)\n(cid:32)\n(cid:88)\n\nN \u00b5j|) \u223c\n\n(cid:88)\n\nE(|Y 3\n\n\u00b5>1,j(cid:54)=i\n\n1\ns3\nN\n\n\u221a\n\n\u00b5>1\n\nd\u00b5\u03be\u00b5\n\ni \u03be\u00b5\n\nj \u03be1\n\nj \u223c N\n\n1\nN\n\n\u00b5>1,j(cid:54)=i\n\n\u00b5>1\n\n(cid:88)\n\n\u00b5>1\n\nd2\n\u00b5.\n\n\u00b5 \u223c 1\nd2\nN\n\n\u00b5>1 d3\n\u00b5\n\u00b5>1 d2\n\u00b5)3/2\n\n\u2192 0.\n\n\u00b5 < ((cid:80)\n(cid:33)\n\n(8)\n\n(9)\n\nLyapunov condition is satis\ufb01ed for \u03b4 = 1. By Lyapunov\u2019s theorem we deduce:\n\n\u00b5>1 d3\n\n\u00b5>1 d2\n\n\u00b5)3/2. Thus the\n\n0,\n\nd2\n\u00b5/N\n\n(10)\n\nSince we also have p2 = o(N ), it follows that we can ignore the second term, i.e., the crosstalk\nterm, in the argument of tanh in Equation (6) as N \u2192 \u221e; we thus obtain:\n\n(11)\nTo examine the \ufb01xed points of the Equation (11), we let d = d1 for convenience and put x = \u03b2dm =\ndm/T , so that tanh x = T x/d; see Figure 1. It follows that Tc = d is the critical temperature. If\nT < d then there is a non-zero (non-trivial) solution for m, whereas for T > d we only have the\ntrivial solution. For d = 1 our solution is that of the standard Hop\ufb01eld network as in [20, page 34].\n\nm = tanh \u03b2d1m.\n\nFigure 1: Stability of strong attractors with noise\n\nTheorem 3.1 The critical temperature for stability of a strong attractor is equal to its degree. (cid:3)\n\n4 Mean \ufb01eld equations for p/N = \u03b1 > 0.\n\nThe case p/N = \u03b1, as for the standard Hop\ufb01eld model, is much harder and we here assume we\nhave only a single pattern \u03be1 with d1 \u2265 1 and the rest of the patterns \u03be\u00b5 are simple with d\u00b5 = 1 for\n2 \u2264 \u00b5 \u2264 p0. The case when there are more than one strong patterns is harder and will be dealt with\nin a future paper. Moreover, we assume d1 (cid:28) p0 which is the interesting case in applications. If\nd1 > 1 then we have a single strong pattern whereas if d1 = 1 the network is reduced to the standard\nHop\ufb01eld network. We recall that all patterns \u03be\u00b5 for 1 \u2264 \u00b5 \u2264 p0 are independent and random. Since\n\n5\n\nx                          y =   tanh x                y > xy = x  ( d  = T)   y < x ( T < d  )       ( d   <  T )\fp and N are assumed to be large and d1 (cid:28) p0, we will replace p0 with p and approximate terms like\np \u2212 2 with p.\nWe again consider the mean \ufb01eld equation (5) for retrieving pattern \u03be1 but now the cross talk term\nin (6) is large and can no longer be ignored. We therefore look at the overlaps, Equation (4), of the\nmean \ufb01eld with all the stored patterns \u03be\u03bd and not just \u03be1.\nCombining Equation (5) and (4), we eliminate the mean \ufb01eld to obtain a recursive equation for the\noverlaps as the new variables:\n\nN(cid:88)\n\nm\u03bd =\n\n1\nN\n\n(cid:32)\n\np(cid:88)\n\n(cid:33)\n\n\u03be\u03bd\ni tanh\n\n\u03b2\n\nd\u00b5\u03be\u00b5\n\ni m\u00b5\n\n(12)\n\ni=1\n\n\u00b5=1\n\nWe now have a family of p stochastic equations for the random variables m\u03bd with 1 \u2264 \u03bd \u2264 p in\norder to retrieve the random pattern \u03be1. Formally, we assume we have a probability space (\u2126,F, P )\nwith the real-valued random variables m\u03bd : \u2126 \u2192 IR, which are measurable with respect to F and\nthe Borel sigma \ufb01eld B over the real line and which take value m\u03bd(\u03c9) \u2208 IR for each sample point\n\u03c9 \u2208 \u2126. The probability of an event A \u2208 B is given by Pr{\u03c9 : m\u03bd(\u03c9) \u2208 A}. As usual \u2126 can itself\nbe taken to the real line with its Borel sigma \ufb01eld and we will usually drop all references to \u2126. We\na.s.\u2212\u2192 X for the almost sure convergence\nneed two lemmas to prove our main result. We write XN\nd\u2212\u2192 X indicates convergence in\nof the sequence of random variables XN to X, whereas XN\ndistribution [6]. Recall that almost sure convergence implies convergence in distribution. To help\nus compute the right hand side of Equation (12), we need the following lemma, which extends the\nstandard result for the Law of Large Numbers and its rate of convergence [5, pages 112 and 113].\n\nLemma 4.1 Let X be a random variable on IR such that its probability distribution F (x) =\nPr(X \u2264 x) is differentiable with density F (cid:48)(x) = f (x). If g : IR \u2192 IR is a bounded measur-\nable function and Xk (k \u2265 1) is a sequence of of independent and identically distributed random\nvariables with distribution X, then\n\n(cid:90) \u221e\n(cid:33)\n\n\u221e\n\ng(Xi)\n\na.s.\u2212\u2192 Eg(X) =\n\ng(x)f (x)dx,\n\n(g(Xi) \u2212 kE(g)(X))\n\n\u2265 \u0001\n\n(cid:33)\n\n= o(1/N t\u22121) (cid:3)\n\n(13)\n\n(14)\n\nand for all \u0001 > 0 and t > 1, we have:\n\n1\nN\n\n(cid:32)\n\nN(cid:88)\n\ni=1\n\nk(cid:88)\n\ni=1\n\n1\nk\n\n(cid:32)\n\nPr\n\nsup\nk\u2265N\n\nThe proof of the above lemma is given on-line in the supplementary material.\nAssume p/N = \u03b1 > 0 with d1 (cid:28) p0 and d\u00b5 = 1 for 1 < \u00b5 \u2264 p0. In the following theorem, we use\nthe basic idea of the heuristic in [17] which is reproduced in [20, section 2.5] to develop a veri\ufb01able\nmathematical method with provable results to solve the mean \ufb01eld equation in the more general case\nthat we have a single strong pattern present in the network.\n\nTheorem 4.2 There is a solution to the mean \ufb01eld equations (12) for retrieving \u03be1 with independent\nrandom variables m\u03bd (for 1 \u2264 \u03bd \u2264 p0), where m1 \u223c N (m, s/N ) and m\u03bd \u223c N (0, r/N ) (for\n\u03bd (cid:54)= 1), if the real numbers m, s and r satisfy the four simultaneous equations:\n\n\uf8f1\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f3\n\n(i) m = (cid:82) \u221e\nq = (cid:82) \u221e\n\n\u2212\u221e dz\u221a\ns = q \u2212 m2\n\u2212\u221e dz\u221a\n(1\u2212\u03b2(1\u2212q))2\n\nr =\n\n(iii)\n(iv)\n\n(ii)\n\n2\u03c0\n\nq\n\n2\u03c0\n\ne\u2212 z2\n\n2 tanh(\u03b2(d1m +\n\n\u221a\n\n\u03b1rz))\n\ne\u2212 z2\n\n2 tanh2(\u03b2(d1m +\n\n\u221a\n\n\u03b1rz))\n\n(15)\n\nIn the proof of this theorem, as given below, we seek a solution of the mean \ufb01eld equations assuming\nwe have independent random variables m\u03bd (for 1 \u2264 \u03bd \u2264 p0) such that for large N and p with\n\n6\n\n\fp/N = \u03b1, we have m1 \u223c N (m, s/N ) and m\u03bd \u223c N (0, r/N ) (\u03bd (cid:54)= 1), and then \ufb01nd conditions in\nterms of m, s and r to ensure that such a solution exists. These assumptions are in effect equivalent\nto the replica symmetry approximation [17, page 262], since they lead, as shown below, to the same\nsolution derived from the replica method when all stored patterns are simple. In analogy with the\nreplica technique, we call our solution symmetric. Since by our assumption about the distribution of\n\u221a\n\u221a\nthe overlaps m\u00b5, the standard deviation of each overlap is O(1/\nN ), we ignore terms of O(1/N )\nand more generally terms of o(1/\nN ) in the proof including in\nN ) compared to terms of O(1/\nthe lemma below, which enables us to compute the argument of tanh in Equation (12) for large N.\nLemma 4.3 If m\u03bd \u223c N (0, r/N ) (for \u03bd (cid:54)= 1), then we have the equivalence of distributions:\n\n\u221a\n\ni m\u00b5 \u223c N (0, \u03b1r) \u223c(cid:88)\n\ni \u03be\u00b5\n\u03be1\n\n(cid:88)\n\n\u00b5(cid:54)=1,\u03bd\n\ni m\u00b5. (cid:3)\ni \u03be\u00b5\n\u03be1\n\n\u00b5(cid:54)=1\n\nThe proofs of the above lemma and Theorem (4.2) are given on-line in the supplementary material.\nWe note that in the heuristic described in [20] the distributions of m1 and m\u03bd (\u03bd (cid:54)= 1) are not\neventually determined yet an initial assumption about the variance of m\u03bd is made. Moreover, the\nheuristic has no assumption on how m\u03bd is distributed, and no valid justi\ufb01cation is provided for\ncomputing the double summation to obtain m\u03bd, which is similar to the lack of justi\ufb01cation for the\ninterchange of limits in the replica technique mentioned in Section 2.\nComparing the equations for m, q and r in Equations (15) with those obtained by the replica\nmethod [20, pages 263-4] or the heuristic in [20, page 37], we see that m has been replaced by\nd1m on the right hand side of the equations for m and q. It follows that for d1 = 1, we obtain the\nsolution for random patterns in the standard Hop\ufb01eld network produced by the replica method.\nWe can solve the simultaneous equations in (15) for m, q and r (and then for s) numerically. As\nin [20, page 38], we examine when these equations have non-trivial solutions (i.e., m (cid:54)= 0) when\nT \u2192 0 corresponding to \u03b2 \u2192 \u221e, where we also have q \u2192 1 but C := \u03b2(1 \u2212 q) remains \ufb01nite:\n\nUsing the relations:(cid:40) (cid:82) \u221e\n(cid:82) \u221e\nC := \u03b2(1 \u2212 q) = (cid:112)2/\u03c0\u03b1r exp(\u2212(dm)2/2\u03b1r)\n\ne\u2212z2/2(1 \u2212 tanh2 \u03b2(az + b)) \u2248 2\na\u03b2 e\u2212b2/2a2\n\u221a\n\u03b2\u2192\u221e\u2212\u2192 erf(b/\ne\u2212z2/2 tanh \u03b2(az + b)\n2a),\n\nwhere erf is the error function, the three equations for m, q and r become:\n\n\u2212\u221e dz\u221a\n\u2212\u221e dz\u221a\n\n(cid:40)\n\n2\u03c0\n\n2\u03c0\n\n1\n\n\u03c0\n\nr = 1/(1 \u2212 C)2,\n\nm = erf(dm/\n\n\u221a\n\n2\u03b1r),\n\nwhere we have put d := d1. Let y = dm/\ny\nd\n\nf\u03b1,d(y) :=\n\n\u221a\n2\u03b1r; then we obtain:\n\u221a\n(\n\ne\u2212y2\n\n2\u03b1 +\n\n2\u221a\n\u03c0\n\n) = erf(y)\n\n(16)\n\n(17)\n\n(18)\n\nFigure 2, gives a schematic view of the solution of Equation (18). The dotted curve is the erf function\non the right hand side of the equation, whereas the three solid curves correspond to the graphs of the\nfunction f\u03b1,d on the left hand side of the equation for a given value of d and three different values\nof \u03b1. The heights of these graphs increase with \u03b1.\nThe critical load parameter \u03b1c(d) is the threshold such that for \u03b1 < \u03b1c(d) the strong pattern with\ndegree d can be retrieved whereas for \u03b1c(d) < \u03b1 this memory is lost. Geometrically, \u03b1c(d) corre-\nsponds to the curve that is tangent, say at yd, to the error function, i.e.,\n\nf(cid:48)\n\u03b1c(d),d(yd) = erf (cid:48)(yd).\n\nFor \u03b1 < \u03b1c(d), the function f\u03b1,d has two non-trivial intersections (away from the origin) with erf\nwhile for \u03b1c(d) < \u03b1 there are no non-trivial intersections.\nWe can compare the storage capacity of strong patterns with that of simple patterns, assuming the\nindependence of m\u03bd (equivalently replica symmetry), by \ufb01nding a lower bound for \u03b1c(d) in terms\n\n7\n\n\fFigure 2: Capacity of strong attractors\n\nof \u03b1c(1) as follows. We have:\n\nf\u03b1,d(y) = y((cid:112)2(\u03b1/d2) +\n\n) \u2264 y((cid:112)2(\u03b1/d2) +\n\n\u221a\n2\nd\n\n\u03c0\n\ne\u2212y2\n\ne\u2212y2\n\n)\n\n2\u221a\n\u03c0\n\n(19)\n\nwhere equality holds iff d = 1. Putting \u03b1 = d2\u03b1c(1) and y = y1, we have for d > 1:\n\nfd2\u03b1c(1),d(y1) < f\u03b1c(1),1(y1) = erf(y1),\n\n(20)\nTherefore, for a strong pattern, the graphs of fd2\u03b1c(1),d and erf intersect in two non-trivial points and\nthus \u03b1c(d) > d2\u03b1c(1). Since \u03b1c(1) = \u03b1c \u2248 0.138, this yields: \u03b1c(d)/0.138 > d2, i.e., the relative\nincrease in the storage capacity exceeds the square of the degree of the strong pattern.\nIn the case of the standard Hop\ufb01eld network with simple patterns only, we have \u03b1c(1) = \u03b1c \u2248\n0.138, but simulation experiments show that for values in the narrow range 0.138 < \u03b1 < 0.144\nthere are replica symmetry breaking solutions for which a stored pattern can still be retrieved [12].\nWe show that the square property holds when we take into account symmetry breaking solutions.\nBy [15, Theorem 1], it follows that the error probability of retrieving a single strong attractor is:\n\nPrer \u2248 1\n2\n\n(1 \u2212 erf(d/\n\u221a\n\n\u221a\n\n2\u03b1),\n\nfor \u03b1 = p/N. Thus, this error will be constant if d/\nvalue of the load parameter is proportional to the square of the degree of the strong attractor.\n\n\u03b1 remains \ufb01xed, indicating that the critical\n\nCorollary 4.4 The storage capacity for retrieving a single strong pattern exceeds that of a simple\npattern by the square of the degree of the strong pattern. (cid:3)\n\nThis square property shows that a multiply learned pattern is retained in the memory in the presence\nof a large number of other random patterns, proportional to the square of its multiplicity.\n\n5 Conclusion\n\nWe have developed a mathematically justi\ufb01able method to derive the storage capacity of the Hop\ufb01eld\nnetwork when the load parameter \u03b1 = p/N remains a positive constant as the network size N \u2192 \u221e.\nFor the standard model, our result con\ufb01rms that of the replica technique, i.e., \u03b1c \u2248 0.138. However,\nour method also computes the storage capacity when retrieving a single strong pattern of degree d\nin the presence of other random patterns and we have shown that this capacity exceeds that of a\nsimple pattern by a multiplicative factor d2, providing further justi\ufb01cation for using strong patterns\nof Hop\ufb01eld networks to model attachment types and behavioural prototypes in psychology.\nThe storage capacity of Hop\ufb01eld networks when there are more than a single strong pattern and in\nnetworks with low neural activation will be addressed in future work. It is also of interest to examine\nthe behaviour of strong patterns in Boltzmann Machines [20], Restricted Boltzmann Machines [28]\nand Deep Learning Networks [21].\n\n8\n\n.yyd0f\u03b1,..derf(y) \u03b1cf d\u03b1  (  ),d\u03b1, df\fReferences\n[1] D. J. Amit. Modeling Brain Function: The World of Attractor Neural Networks. Cambridge, 1989.\n[2] D. J. Amit, H. Gutfreund, and H. Sompolinsky. Spin-glass models of neural networks. Phys. Rev. A,\n\n32:1007\u20131018, 1985.\n\n[3] D. J. Amit, H. Gutfreund, and H. Sompolinsky. Storing in\ufb01nite numbers of patterns in a spin-glass model\n\nof neural networks. Phys. Rev. Lett., 55:1530\u20131533, Sep 1985.\n\n[4] D. J. Amit, H. Gutfreund, and H. Sompolinsky. Information storage in neural networks with low levels of\n\nactivity. Phys. Rev. A, 35:2293\u20132303, Mar 1987.\n\n[5] L. E. Baum and M. Katz. Convergence rates in the law of large numbers. Transactions of the American\n\nMathematical Society, 120(1):108\u2013123, 1965.\n\n[6] P. Billingsley. Probability and Measure. John Wiley & Sons, second edition, 1986.\n[7] E. Bolthausen. Random media and spin glasses: An introduction into some mathematical results and\nIn E. Bolthausen and A. Bovier, editors, Spin Glasses, volume 1900 of Lecture Notes in\n\nproblems.\nMathematics. Springer, 2007.\n\n[8] A. Bovier and V. Gayrard. Hop\ufb01eld models as generalized random mean \ufb01eld models. In A. Bovier and\nP. Picco, editors, Mathematical Aspects of Spin Glasses and Neural Networks, pages 3\u201389. Birkhuser,\n1998.\n\n[9] John Bowlby. Attachment: Volume One of the Attachment and Loss Trilogy. Pimlico, second revised\n\nedition, 1997.\n\n[10] L. Cozolino. The Neuroscience of Human Relationships. W. W. Norton, 2006.\n[11] F. Crick and G. Mitchison. The function of dream sleep. Nature, 304:111\u2013114, 1983.\n[12] A. Crisanti, D. J. Amit, and H. Gutfreund. Saturation level of the hop\ufb01eld model for neural network.\n\nEurophys. Lett., 2(337), 1986.\n\n[13] L. F. Cugliandolo and M. V. Tsodyks. Capacity of networks with correlated attractors. Journal of Physics\n\nA: Mathematical and General, 27(3):741, 1994.\n\n[14] V. Dotsenko. An Introduction to the theory of spin glasses and neural networks. World Scienti\ufb01c, 1994.\n[15] A. Edalat and F. Mancinelli. Strong attractors of Hop\ufb01eld neural networks to model attachment types and\n\nbehavioural patterns. In IJCNN 2013 Conference Proceedings. IEEE, August 2013.\n\n[16] K. H. Fischer and J. A. Hertz. Spin Glasses (Cambridge Studies in Magnetism). Cambridge, 1993.\n[17] T. Geszti. Physical Models of Neural Networks. World Scienti\ufb01c, 1990.\n[18] R. J. Glauber. Time\u2013dependent statistics of the Ising model. J. Math. Phys., 294(4), 1963.\n[19] H. Gutfreund. Neural networks with hierarchically correlated patterns. Phys. Rev. A, 37:570\u2013577, 1988.\n[20] J. A. Hertz, A. S. Krogh, and R. G. Palmer. Introduction To The Theory Of Neural Computation. Westview\n\nPress, 1991.\n\n[21] G. E. Hinton, S. Osindero, and Y. W. Teh. A fast learning algorithm for deep belief nets. Neural compu-\n\ntation, 18(7):1527\u20131554, 2006.\n\n[22] R. E. Hoffman. Computer simulations of neural information processing and the schizophrenia-mania\n\ndichotomy. Arch Gen Psychiatry., 44(2):178\u201388, 1987.\n\n[23] J. J. Hop\ufb01eld. Neural networks and physical systems with emergent collective computational abilities.\n\nProceedings of the National Academy of Science, USA, 79:2554\u20132558, 1982.\n\n[24] T. Lewis, F. Amini, and R. Richard. A General Theory of Love. Vintage, 2000.\n[25] Matthias Lowe. On The Storage Capacity of Hop\ufb01eld Models with Correlated Patterns. Annals of App-\n\nplied Probability, 8(4):1216\u20131250, 1998.\n\n[26] M. Mezard, G. Parisi, and M. Virasoro, editors. Spin Glass Theory and Beyond. World Scienti\ufb01c, 1986.\n[27] P. Peretto. On learning rules and memory storage abilities of asymmetrical neural networks. J. Phys.\n\nFrance, 49:711\u2013726, 1998.\n\n[28] A. Salakhutdinov, R.and Mnih and G. Hinton. Restricted Boltzmann machines for collaborative \ufb01ltering.\n\nIn Proceedings of the 24th international conference on Machine learning, pages 791\u2013798, 2007.\n\n[29] A. N. Schore. Affect Dysregulation and Disorders of the Self. W. W. Norton, 2003.\n[30] T. S. Smith, G. T. Stevens, and S. Caldwell. The familiar and the strange: Hop\ufb01eld network models\nfor prototype-entrained. In T. S. Franks, D. D. and Smith, editor, Mind, brain, and society: Toward a\nneurosociology of emotion, volume 5 of Social perspectives on emotion. Elsevier/JAI, 1999.\n\n[31] M. Tsodyks and M. Feigelman. Enhanced storage capacity in neural networks with low level of activity.\n\nEurophysics Letters,, 6:101\u2013105, 1988.\n\n9\n\n\f", "award": [], "sourceid": 1248, "authors": [{"given_name": "Abbas", "family_name": "Edalat", "institution": "Imperial College London"}]}