{"title": "A memory frontier for complex synapses", "book": "Advances in Neural Information Processing Systems", "page_first": 1034, "page_last": 1042, "abstract": "An incredible gulf separates theoretical models of synapses, often described solely by a single scalar value denoting the size of a postsynaptic potential, from the immense complexity of molecular signaling pathways underlying real synapses. To understand the functional contribution of such molecular complexity to learning and memory, it is essential to expand our theoretical conception of a synapse from a single scalar to an entire dynamical system with many internal molecular functional states. Moreover, theoretical considerations alone demand such an expansion; network models with scalar synapses assuming finite numbers of distinguishable synaptic strengths have strikingly limited memory capacity. This raises the fundamental question, how does synaptic complexity give rise to memory? To address this, we develop new mathematical theorems elucidating the relationship between the structural organization and memory properties of complex synapses that are themselves molecular networks. Moreover, in proving such theorems, we uncover a framework, based on first passage time theory, to impose an order on the internal states of complex synaptic models, thereby simplifying the relationship between synaptic structure and function.", "full_text": "A memory frontier for complex synapses\n\nSubhaneil Lahiri and Surya Ganguli\n\nDepartment of Applied Physics, Stanford University, Stanford CA\nsulahiri@stanford.edu, sganguli@stanford.edu\n\nAbstract\n\nAn incredible gulf separates theoretical models of synapses, often described solely\nby a single scalar value denoting the size of a postsynaptic potential, from the\nimmense complexity of molecular signaling pathways underlying real synapses.\nTo understand the functional contribution of such molecular complexity to learn-\ning and memory, it is essential to expand our theoretical conception of a synapse\nfrom a single scalar to an entire dynamical system with many internal molecular\nfunctional states. Moreover, theoretical considerations alone demand such an ex-\npansion; network models with scalar synapses assuming \ufb01nite numbers of distin-\nguishable synaptic strengths have strikingly limited memory capacity. This raises\nthe fundamental question, how does synaptic complexity give rise to memory? To\naddress this, we develop new mathematical theorems elucidating the relationship\nbetween the structural organization and memory properties of complex synapses\nthat are themselves molecular networks. Moreover, in proving such theorems, we\nuncover a framework, based on \ufb01rst passage time theory, to impose an order on\nthe internal states of complex synaptic models, thereby simplifying the relation-\nship between synaptic structure and function.\n\n1\n\nIntroduction\n\nIt is widely thought that our very ability to remember the past over long time scales depends crucially\non our ability to modify synapses in our brain in an experience dependent manner. Classical models\nof synaptic plasticity model synaptic ef\ufb01cacy as an analog scalar value, denoting the size of a post-\nsynaptic potential injected into one neuron from another. Theoretical work has shown that such\nmodels have a reasonable, extensive memory capacity, in which the number of long term associations\nthat can be stored by a neuron is proportional its number of afferent synapses [1\u20133]. However,\nrecent experimental work has shown that many synapses are more digital than analog; they cannot\nrobustly assume an in\ufb01nite continuum of analog values, but rather can only take on a \ufb01nite number\nof distinguishable strengths, a number than can be as small as two [4\u20136] (though see [7]). This\none simple modi\ufb01cation leads to a catastrophe in memory capacity: classical models with digital\nsynapses, when operating in a palimpset mode in which the ongoing storage of new memories can\noverwrite previous memories, have a memory capacity proportional to the logarithm of the number\nof synapses [8, 9]. Intuitively, when synapses are digital, the storage of a new memory can \ufb02ip\na population of synaptic switches, thereby rapidly erasing previous memories stored in the same\nsynaptic population. This result indicates that the dominant theoretical basis for the storage of long\nterm memories in modi\ufb01able synaptic switches is \ufb02awed.\nRecent work [10\u201312] has suggested that a way out of this logarithmic catastrophe is to expand our\ntheoretical conception of a synapse from a single scalar value to an entire stochastic dynamical sys-\ntem in its own right. This conceptual expansion is further necessitated by the experimental reality\nthat synapses contain within them immensely complex molecular signaling pathways, with many in-\nternal molecular functional states (e.g. see [4, 13, 14]). While externally, synaptic ef\ufb01cacy could be\ndigital, candidate patterns of electrical activity leading to potentiation or depression could yield tran-\nsitions between these internal molecular states without necessarily inducing an associated change in\n\n1\n\n\fsynaptic ef\ufb01cacy. This form of synaptic change, known as metaplasticity [15, 16], can allow the\nprobability of synaptic potentiation or depression to acquire a rich dependence on the history of\nprior changes in ef\ufb01cacy, thereby potentially improving memory capacity.\nTheoretical studies of complex, metaplastic synapses have focused on analyzing the memory per-\nformance of a limited number of very speci\ufb01c molecular dynamical systems, characterized by a\nnumber of internal states in which potentiation and depression each induce a speci\ufb01c set of allow-\nable transitions between states (e.g. see Figure 1 below). While these models can vastly outperform\nsimple binary synaptic switches, these analyses leave open several deep and important questions.\nFor example, how does the structure of a synaptic dynamical system determine its memory perfor-\nmance? What are the fundamental limits of memory performance over the space of all possible\nsynaptic dynamical systems? What is the structural organization of synaptic dynamical systems that\nachieve these limits? Moreover, from an experimental perspective, it is unlikely that all synapses\ncan be described by a single canonical synaptic model; just like the case of neurons, there is an\nincredible diversity of molecular networks underlying synapses both across species and across brain\nregions within a single organism [17]. In order to elucidate the functional contribution of this di-\nverse molecular complexity to learning and memory, it is essential to move beyond the analysis of\nspeci\ufb01c models and instead develop a general theory of learning and memory for complex synapses.\nMoreover, such a general theory of complex synapses could aid in development of novel arti\ufb01cial\nmemory storage devices.\nHere we initiate such a general theory by proving upper bounds on the memory curve associated with\nany synaptic dynamical system, within the well established ideal observer framework of [10, 11, 18].\nAlong the way we develop principles based on \ufb01rst passage time theory to order the structure of\nsynaptic dynamical systems and relate this structure to memory performance. We summarize our\nmain results in the discussion section.\n\n2 Overall framework: synaptic models and their memory curves\n\nIn this section, we describe the class of models of synaptic plasticity that we are studying and how\nwe quantify their memory performance. In the subsequent sections, we will \ufb01nd upper bounds on\nthis performance.\nWe use a well established formalism for the study of learning and memory with complex synapses\n(see [10, 11, 18]). In this approach, electrical patterns of activity corresponding to candidate po-\ntentiating and depressing plasticity events occur randomly and independently at all synapses at a\nPoisson rate r. These events re\ufb02ect possible synaptic changes due to either spontaneous network\nactivity, or the storage of new memories. We let f pot and f dep denote the fraction of these events that\nare candidate potentiating or depressing events respectively. Furthermore, we assume our synaptic\nmodel has M internal molecular functional states, and that a candidate potentiating (depotentiat-\ning) event induces a stochastic transition in the internal state described by an M \u00d7 M discrete time\nMarkov transition matrix Mpot (Mdep). In this framework, the states of different synapses will be\nindependent, and the entire synaptic population can be fully described by the probability distribution\nacross these states, which we will indicate with the row-vector p(t). Thus the i\u2019th component of\np(t) denotes the fraction of the synaptic population in state i. Furthermore, each state i has its own\nsynaptic weight, wi, which we take, in the worst case scenario, to be restricted to two values. After\nshifting and scaling these two values, we can assume they are \u00b11, without loss of generality.\nWe also employ an \u201cideal observer\u201d approach to the memory readout, where the synaptic weights\nare read directly. This provides an upper bound on the quality of any readout using neural activity.\nFor any single memory, stored at time t = 0, we assume there will be an ideal pattern of synaptic\nweights across a population of N synapses, the N-element vector (cid:126)wideal, that is +1 at all synapses\nthat experience a candidate potentiation event, and \u22121 at all synapses that experience a candidate\ndepression event at the time of memory storage. We assume that any pattern of synaptic weights\nclose to (cid:126)wideal is suf\ufb01cient to recall the memory. However, the actual pattern of synaptic weights at\nsome later time, t, will change to (cid:126)w(t) due to further modi\ufb01cations from the storage of subsequent\nmemories. We can use the overlap between these, (cid:126)wideal\u00b7 (cid:126)w(t), as a measure of the quality of the\nmemory. As t \u2192 \u221e, the system will return to its steady state distribution which will be uncorrelated\n\n2\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 1: Models of complex synapses. (a) The cascade model of [10], showing transitions between\nstates of high/low synaptic weight (red/blue circles) due to potentiation/depression (solid red/dashed\nblue arrows). (b) The serial model of [12]. (c) The memory curves of these two models, showing\nthe decay of the signal-to-noise ratio (to be de\ufb01ned in \u00a72) as subsequent memories are stored.\n\nwith the memory stored at t = 0. The probability distribution of the quantity (cid:126)wideal\u00b7 (cid:126)w(\u221e) can be\nused as a \u201cnull model\u201d for comparison.\nThe extent to which the memory has been stored is described by a signal-to-noise ratio (SNR)\n[10, 11]:\n\nSNR(t) =\n\n.\n\n(1)\n\n(cid:104) (cid:126)wideal\u00b7 (cid:126)w(t)(cid:105) \u2212 (cid:104) (cid:126)wideal\u00b7 (cid:126)w(\u221e)(cid:105)\n\n(cid:112)Var( (cid:126)wideal\u00b7 (cid:126)w(\u221e))\n\n\u221a\n\nThe noise in the denominator is essentially\nN. There is a correction when potentiation and de-\npression are imbalanced, but this will not affect the upper bounds that we will discuss below and\nwill be ignored in the subsequent formulae.\nA simple average memory curve can be derived as follows. All of the preceding plasticity events,\nprior to t = 0, will put the population of synapses in its steady-state distribution, p\u221e. The mem-\nory we are tracking at t = 0 will change the internal state distribution to p\u221eMpot (or p\u221eMdep)\nin those synapses that experience a candidate potentiation (or depression) event. As the potentiat-\ning/depressing nature of the subsequent memories is independent of (cid:126)wideal, we can average over all\nsequences, resulting in the evolution of the probability distribution:\n\ndp(t)\n\ndt\n\n= rp(t)WF,\n\nwhere WF = f potMpot + f depMdep \u2212 I.\n\n(2)\n\nw.\n\nSNR(t) =\n\nHere WF is a continuous time transition matrix that models the process of forgetting the memory\nstored at time t = 0 due to random candidate potentiation/depression events occurring at each\nsynapse due to the storage of subsequent memories. Its stationary distribution is p\u221e.\nThis results in the following SNR\n\u221a\n\nN(cid:0)2f potf dep(cid:1) p\u221e(cid:0)Mpot \u2212 Mdep(cid:1) ertWF\n\n(3)\nA detailed derivation of this formula can be found in the supplementary material. We will frequently\nrefer to this function as the memory curve. It can be thought of as the excess fraction of synapses\n(relative to equilibrium) that maintain their ideal synaptic strength at time t, as dictated by the stored\nmemory at time t = 0.\nMuch of the previous work on these types of complex synaptic models has focused on understanding\nthe memory curves of speci\ufb01c models, or choices of Mpot/dep. Two examples of these models are\nshown in Figure 1. We see that they have different memory properties. The serial model performs\nrelatively well at one particular timescale, but it performs poorly at other times. The cascade model\ndoes not perform quite as well at that time, but it maintains its performance over a wider range of\ntimescales.\nIn this work, rather than analyzing speci\ufb01c models, we take a different approach, in order to obtain\na more general theory. We consider the entire space of these models and \ufb01nd upper bounds on the\nmemory capacity of any of them. The space of models with a \ufb01xed number of internal states M is\nparameterized by the pair of M \u00d7 M discrete time stochastic transition matrices Mpot and Mdep, in\naddition to f pot/dep. The parameters must satisfy the following constraints:\np\u221eWF = 0,\n\nf pot/dep \u2208 [0, 1],\n\nwi = \u00b11,\n\nf pot + f dep = 1,\n\np\u221e\ni = 1.\n\n(4)\n\n(cid:88)\n\nij\n\nMpot/dep\nMpot/dep\n\nij\n\n\u2208 [0, 1],\n= 1,\n\n(cid:88)\n\nj\n\ni\n\n3\n\nCascade modelSerial model10\u2212110010110210310\u2212310\u2212210\u22121TimeSNR CascadeSerial\fThe upper bounds on Mpot/dep\nThe critical question is: what do these constraints imply about the space of achievable memory\ncurves in (3)? To answer this question, especially for limits on achievable memory at \ufb01nite times, it\nwill be useful to employ the eigenmode decomposition:\n\nand f pot/dep follow automatically from the other constraints.\n\nij\n\nWF =\n\n\u2212qauava, vaub = \u03b4ab, WFua = \u2212qaua, vaWF = \u2212qava.\n\n(5)\n\n(cid:88)\n\na\n\nHere qa are the negative of the eigenvalues of the forgetting process WF, ua are the right (column)\neigenvectors and va are the left (row) eigenvectors. This decomposition allows us to write the\nmemory curve as a sum of exponentials,\n\nSNR(t) =\n\nN\n\n(6)\nwhere Ia = (2f potf dep)p\u221e(Mpot \u2212 Mdep)uavaw and \u03c4a = 1/qa. We can then ask the question:\nwhat are the constraints on these quantities, namely eigenmode initial SNR\u2019s, Ia, and time constants,\n\u03c4a, implied by the constraints in (4)? We will derive some of these constraints in the next section.\n\na\n\n(cid:88)\n\n\u221a\n\nIae\u2212rt/\u03c4a ,\n\n3 Upper bounds on achievable memory capacity\n\nIn the previous section, in (3) we have described an analytic expression for a memory curve as a\nfunction of the structure of a synaptic dynamical system, described by the pair of stochastic transition\nmatrices Mpot/dep. Since the performance measure for memory is an entire memory curve, and not\njust a single number, there is no universal scalar notion of optimal memory in the space of synaptic\ndynamical systems. Instead there are tradeoffs between storing proximal and distal memories; often\nattempts to increase memory at late (early) times by changing Mpot/dep, incurs a performance loss\nin memory at early (late) times in speci\ufb01c models considered so far [10\u201312]. Thus our end goal,\nachieved in \u00a74, is to derive an envelope memory curve in the SNR-time plane, or a curve that forms\nan upper-bound on the entire memory curve of any model. In order to achieve this goal, in this\nsection, we must \ufb01rst derive upper bounds, over the space of all possible synaptic models, on two\ndifferent scalar functions of the memory curve: its initial SNR, and the area under the memory curve.\nIn the process of upper-bounding the area, we will develop an essential framework to organize the\nstructure of synaptic dynamical systems based on \ufb01rst passage time theory.\n\n3.1 Bounding initial SNR\n\nWe now give an upper bound on the initial SNR,\n\nN(cid:0)2f potf dep(cid:1) p\u221e(cid:0)Mpot \u2212 Mdep(cid:1) w,\n\n\u221a\n\nSNR(0) =\n\n(7)\n\nover all possible models and also \ufb01nd the class of models that saturate this bound. A useful quantity\nis the equilibrium probability \ufb02ux between two disjoint sets of states, A and B:\n\n\u03a6AB =\n\nrp\u221e\n\ni WF\nij.\n\n(8)\n\nThe initial SNR is closely related to the \ufb02ux from the states with wi = \u22121 to those with wj = +1\n(see supplementary material):\n\nSNR(0) \u2264 4\n\nN \u03a6\u2212+\n\nr\n\n.\n\n(9)\n\nThis inequality becomes an equality if potentiation never decreases the synaptic weight and depres-\nsion never increases it, which should be a property of any sensible model.\nTo maximize this \ufb02ux, potentiation from a weak state must be guaranteed to end in a strong state,\nand depression must do the reverse. An example of such a model is shown in Figure 2(a,b). These\nmodels have a property known as \u201clumpability\u201d (see [19, \u00a76.3] for the discrete time version and\n[20, 21] for continuous time). They are completely equivalent (i.e. have the same memory curve) as\na two state model with transition probabilities equal to 1, as shown in Figure 2(c).\n\n4\n\n(cid:88)\n\n(cid:88)\n\ni\u2208A\n\nj\u2208B\n\n\u221a\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 2: Synaptic models that maximize initial SNR. (a) For potentiation, all transitions starting\nfrom a weak state lead to a strong state, and the probabilities for all transitions leaving a given weak\nstate sum to 1. (a) Depression is similar to potentiation, but with strong and weak interchanged.\n(c) The equivalent two state model, with transition probabilities under potentiation and depression\nequal to one.\n\nThis two state model has the equilibrium distribution p\u221e = (f dep, f pot) and its \ufb02ux is given by\n\u03a6\u2212+ = rf potf dep. This is maximized when f pot = f dep = 1\n\n2, leading to the upper bound:\n\n\u221a\n\nSNR(0) \u2264\n\nN .\n\n(10)\n\nWe note that while this model has high initial SNR, it also has very fast memory decay \u2013 with a\ntimescale \u03c4 \u223c 1\nr . As the synapse is very plastic, the initial memory is encoded very easily, but\nthe subsequent memories also overwrite it rapidly. This is one example of the tradeoff between\noptimizing memory at early versus late times.\n\n3.2\n\nImposing order on internal states through \ufb01rst passage times\n\nOur goal of understanding the relationship between structure and function in the space of all possible\nsynaptic models is complicated by the fact that this space contains many different possible network\ntopologies, encoded in the nonzero matrix elements of Mpot/dep. To systematically analyze this\nentire space, we develop an important organizing principle using the theory of \ufb01rst passage times\nin the stochastic process of forgetting, described by WF. The mean \ufb01rst passage time matrix, Tij,\nis de\ufb01ned as the average time it takes to reach state j for the \ufb01rst time, starting from state i. The\ndiagonal elements are de\ufb01ned to be zero.\nA remarkable theorem we will exploit is that the quantity\nTijp\u221e\nj ,\n\n\u03b7 \u2261(cid:88)\n\n(11)\n\nj\n\nknown as Kemeny\u2019s constant (see [19, \u00a74.4]), is independent of the starting state i. Intuitively, (11)\nstates that the average time it takes to reach any state, weighted by its equilibrium probability, is\nindependent of the starting state, implying a hidden constancy inherent in any stochastic process.\nIn the context of complex synapses, we can de\ufb01ne the partial sums\n\n(cid:88)\n\nj\u2208+\n\n(cid:88)\n\nj\u2208\u2212\n\n\u03b7+\ni =\n\nTijp\u221e\nj ,\n\n\u03b7\u2212\ni =\n\nTijp\u221e\nj .\n\n(12)\n\ni . Because \u03b7+\n\ni or increasing \u03b7\u2212\n\nThese can be thought of as the average time it takes to reach the strong/weak states respectively.\nUsing these de\ufb01nitions, we can then impose an order on the states by arranging them in order of\ndecreasing \u03b7+\ni = \u03b7 is independent of i, the two orderings are\nthe same. In this order, which depends sensitively on the structure of Mpot/dep, states later (to the\nright in \ufb01gures below) can be considered to be more potentiated than states earlier (to the left in\n\ufb01gures below), despite the fact that they have the same synaptic ef\ufb01cacy. In essence, in this order, a\nstate is considered to be more potentiated if the average time it takes to reach all the strong ef\ufb01cacy\nstates is shorter. We will see that synaptic models that optimize various measures of memory have\nan exceedingly simple structure when, and only when, their states are arranged in this order.1\n\ni + \u03b7\u2212\n\n1Note that we do not need to worry about the order of the \u03b7\n\n\u00b1\ni changing during the optimization: necessary\nconditions for a maximum only require that there is no in\ufb01nitesimal perturbation that increases the area. There-\nfore we need only consider an in\ufb01nitesimal neighborhood of the model, in which the order will not change.\n\n5\n\n11\f(a)\n\n(b)\n\n(c)\n\n(a) Perturbations that increase elements of Mpot\nFigure 3: Perturbations that increase the area.\nabove the diagonal and decrease the corresponding elements of Mdep. It can no longer be used\nwhen Mdep is lower triangular, i.e. depression must move synapses to \u201cmore depressed\u201d states. (b)\nPerturbations that decrease elements of Mpot below the diagonal and increase the corresponding\nelements of Mdep. It can no longer be used when Mpot is upper triangular, i.e. potentiation must\nmove synapses to \u201cmore potentiated\u201d states. (c) Perturbation that decreases \u201cshortcut\u201d transitions\nand increases the bypassed \u201cdirect\u201d transitions. It can no longer be used when there are only nearest-\nneighbor \u201cdirect\u201d transitions.\n\n3.3 Bounding area\n\nNow consider the area under the memory curve:\n\nA =\n\ndt SNR(t).\n\n(cid:90) \u221e\n\n0\n\n(cid:88)\n(cid:88)\n\nij\n\nij\n\nWe will \ufb01nd an upper bound on this quantity as well as the model that saturates this bound.\nFirst passage time theory introduced in the previous section becomes useful because the area has a\nsimple expression in terms of quantities introduced in (12) (see supplementary material):\n\n\u221a\n\nA =\n\nN (4f potf dep)\n\n\u221a\n\n=\n\nN (4f potf dep)\n\n(cid:16)\n(cid:16)\n\np\u221e\n\ni\n\np\u221e\n\ni\n\nMpot\n\nij \u2212 Mdep\n\nij\n\nMpot\n\nij \u2212 Mdep\n\nij\n\n(cid:17)(cid:0)\u03b7+\n(cid:17)(cid:0)\u03b7\u2212\n\ni \u2212 \u03b7+\n\nj\n\nj \u2212 \u03b7\u2212\n\ni\n\n(cid:1)\n(cid:1) .\n\n(13)\n\n(14)\n\nij and decreasing Mdep\n\nWith the states in the order described above, we can \ufb01nd perturbations of Mpot/dep that will always\nincrease the area, whilst leaving the equilibrium distribution, p\u221e, unchanged. Some of these pertur-\nbations are shown in Figure 3, see supplementary material for details. For example, in Figure 3(a),\nfor two states i on the left and j on the right, with j being more \u201cpotentiated\u201d than i (i.e. \u03b7+\ni > \u03b7+\nj ),\nwe have proven that increasing Mpot\nleads to an increase in area. The only\nthing that can prevent these perturbations from increasing the area is when they require the decrease\nof a matrix element that has already been set to 0. This determines the topology (non-zero transition\nprobabilities) of the model with maximal area. It is of the form shown in Figure 4(c),with potenti-\nation moving one step to the right and depression moving one step to the left. Any other topology\nwould allow some class of perturbations (e.g. in Figure 3) to further increase the area.\nAs these perturbations do not change the equilibrium distribution, this means that the area of any\nmodel is bounded by that of a linear chain with the same equilibrium distribution. The area of\na linear chain model can be expressed directly in terms of its equilibrium state distribution, p\u221e,\nyielding the following upper bound on the area of any model with the same p\u221e (see supplementary\nmaterial):\n\nij\n\n\uf8ee\uf8f0k \u2212(cid:88)\nwhere we chose wk = sgn[k \u2212(cid:80)\n\n\u221a\nA \u2264 2\n\n(cid:88)\n\nN\nr\n\nk\n\nj\n\nj jp\u221e\n\n\uf8f9\uf8fb p\u221e\n\njp\u221e\n\nj\n\nk wk =\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)k \u2212(cid:88)\n\nj\n\n(cid:88)\n\nk\n\n\u221a\n\n2\n\nN\nr\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p\u221e\n\nk ,\n\njp\u221e\n\nj\n\nj ]. We can then maximize this by pushing all of the equilib-\nrium distribution symmetrically to the two end states. This can be done by reducing the transition\nprobabilities out of these states, as in Figure 4(c). This makes it very dif\ufb01cult to exit these states\nonce they have been entered. The resulting area is\n\n(15)\n\n\u221a\n\nA \u2264\n\nN (M \u2212 1)\n\nr\n\n.\n\n(16)\n\nThis analytical result is similar to a numerical result found in [18] under a slightly different infor-\nmation theoretic measure of memory performance.\n\n6\n\n\fThe \u201csticky\u201d end states result in very slow decay of memory, but they also make it dif\ufb01cult to encode\nthe memory in the \ufb01rst place, since a small fraction of synapses are able to change synaptic ef\ufb01cacy\nduring the storage of a new memory. Thus models that maximize area optimize memory at late\ntimes, at the expense of early times.\n\n4 Memory curve envelope\n\n(cid:88)\n(cid:88)\n\na\n\na\n\nNow we will look at the implications of the upper bounds found in the previous section for the SNR\nat \ufb01nite times. As argued in (6), the memory curve can be written in the form\n\n\u221a\n\nSNR(t) =\n\nN\n\nIae\u2212rt/\u03c4a .\n\n(17)\n\n(cid:88)\n\na\n\nIa \u2264 1,\n\nIa\u03c4a \u2264 M \u2212 1.\n\nThe upper bounds on the initial SNR, (10), and the area, (16), imply the following constraints on the\nparameters {Ia, \u03c4a}:\n\n(18)\nWe are not claiming that these are a complete set of constraints: not every set {Ia, \u03c4a} that satis\ufb01es\nthese inequalities will actually be achievable by a synaptic model. However, any set that violates\neither inequality will de\ufb01nitely not be achievable.\nNow we can pick some \ufb01xed time, t0, and maximize the SNR at that time wrt. the parameters\n{Ia, \u03c4a}, subject to the constraints above. This always results in a single nonzero Ia; in essence,\noptimizing memory at a single time requires a single exponential. The resulting optimal memory\ncurve, along with the achieved memory at the chosen time, depends on t0 as follows:\nt0 \u2264 M \u2212 1\nt0 \u2265 M \u2212 1\n\n\u221a\nN e\u2212rt/(M\u22121)\n\u221a\nN (M \u2212 1)e\u2212t/t0\n\n\u221a\nN e\u2212rt0/(M\u22121),\n\u221a\nN (M \u2212 1)\n\n=\u21d2 SNR(t0) =\n=\u21d2 SNR(t0) =\n\n=\u21d2 SNR(t) =\n\n=\u21d2 SNR(t) =\n\n(19)\n\n.\n\nr\n\nr\n\nrt0\n\nert0\n\nBoth the initial SNR bound and the area bound are saturated at early times. At late times, only\nthe area bound is saturated. The function SNR(t0), the green curve in Figure 4(a), above forms a\nmemory curve envelope with late-time power-law decay \u223c t\u22121\n0 . No synaptic model can have an\nSNR that is greater than this at any time. We can use this to \ufb01nd an upper bound on the memory\nlifetime, \u03c4 (\u0001), by \ufb01nding the point at which the envelope crosses \u0001:\n\n\u03c4 (\u0001) \u2264\n\n\u221a\n\nN (M \u2212 1)\n\n\u0001er\n\n,\n\n(20)\n\nwhere we assume N > (\u0001e)2. Intriguingly, both the lifetime and memory envelope expand linearly\nwith the number of internal states M, and increase as the square root of the number of synapses N.\nThis leaves the question of whether this bound is achievable. At any time, can we \ufb01nd a model\nwhose memory curve touches the envelope? The red curves in Figure 4(a) show the closest we\nhave come to the envelope with actual models, by repeated numerical optimization of SNR(t0) over\nMpot/dep with random initialization and by hand designed models.\nWe see that at early, but not late times, there is a gap between the upper bound that we can prove\nand what we can achieve with actual models. There may be other models we haven\u2019t found that\ncould beat the ones we have, and come closer to our proven envelope. However, we suspect that the\narea constraint is not the bottleneck for optimizing memory at times less than O( M\nr ). We believe\nthere is some other constraint that prevents models from approaching the envelope, and currently are\nexploring several mathematical conjectures for the precise form of this constraint in order to obtain\na potentially tighter envelope. Nevertheless, we have proven rigorously that no model\u2019s memory\ncurve can ever exceed this envelope, and that it is at least tight for late times, longer than O( M\nr ),\nwhere models of the form in Figure 4(c)can come close to the envelope.\n\n5 Discussion\n\nWe have initiated the development of a general theory of learning and memory with complex\nsynapses, allowing for an exploration of the entire space of complex synaptic models, rather than\n\n7\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 4: The memory curve envelope for N = 100, M = 12. (a) An upper bound on the SNR\nat any time is shown in green. The red dashed curve shows the result of numerical optimization of\nsynaptic models with random initialization. The solid red curve shows the highest SNR we have\nfound with hand designed models. At early times these models are of the form shown in (b) with\ndifferent numbers of states, and all transition probabilities equal to 1. At late times they are of the\nform shown in (c) with different values of \u03b5. The model shown in (c) also saturates the area bound\n(16) in the limit \u03b5 \u2192 0.\n\n\u221a\n\nanalyzing individual models one at a time. In doing so, we have obtained several new mathemati-\ncal results delineating the functional limits of memory achievable by synaptic complexity, and the\nstructural characterization of synaptic dynamical systems that achieve these limits. In particular,\noperating within the ideal observer framework of [10, 11, 18], we have shown that for a population\nof N synapses with M internal states, (a) the initial SNR of any synaptic model cannot exceed\nN,\nand any model that achieves this bound is equivalent to a binary synapse, (b) the area under the\n\u221a\nmemory curve of any model cannot exceed that of a linear chain model with the same equilibrium\ndistribution, (c) both the area and memory lifetime of any model cannot exceed O(\nN M ), and the\nmodel that achieves this limit has a linear chain topology with only nearest neighbor transitions, (d)\nwe have derived an envelope memory curve in the SNR-time plane that cannot be exceeded by the\nmemory curve of any model, and models that approach this envelope for times greater than O( M\n\u221a\nr )\nare linear chain models, and (e) this late-time envelope is a power-law proportional to O(\nN M /rt),\nindicating that synaptic complexity can strongly enhance the limits of achievable memory.\nThis theoretical study opens up several avenues for further inquiry. In particular, the tightness of our\nenvelope for early times, less than O( M\nr ), remains an open question, and we are currently pursuing\nseveral conjectures. We have also derived memory constrained envelopes, by asking in the space\nof models that achieve a given SNR at a given time, what is the maximal SNR achievable at other\ntimes. If these two times are beyond a threshold separation, optimal constrained models require\ntwo exponentials. It would be interesting to systematically analyze the space of models that achieve\ngood memory at multiple times, and understand their structural organization, and how they give rise\nto multiple exponentials, leading to power law memory decays.\nFinally, it would be interesting to design physiological experiments in order to perform optimal\nsystems identi\ufb01cation of potential Markovian dynamical systems hiding within biological synapses,\ngiven measurements of pre and post-synaptic spike trains along with changes in post-synaptic po-\ntentials. Then given our theory, we could match this measured synaptic model to optimal models to\nunderstand for which timescales of memory, if any, biological synaptic dynamics may be tuned.\nIn summary, we hope that a deeper theoretical understanding of the functional role of synaptic\ncomplexity, initiated here, will help advance our understanding of the neurobiology of learning and\nmemory, aid in the design of engineered memory circuits, and lead to new mathematical theorems\nabout stochastic processes.\n\nAcknowledgements\n\nWe thank Sloan, Genenetech, Burroughs-Wellcome, and Swartz foundations for support. We thank\nLarry Abbott, Marcus Benna, Stefano Fusi, Jascha Sohl-Dickstein and David Sussillo for useful\ndiscussions.\n\n8\n\n10\u2212110010110210310\u2212210\u22121100101TimeSNRenvelopenumerical searchhand designedInitial SNR bound activeArea bound active\u03b5\u03b5\fReferences\n[1] J. J. Hop\ufb01eld, \u201cNeural networks and physical systems with emergent collective computational\n\nabilities,\u201d Proc. Natl. Acad. Sci. U.S.A. 79 (1982) no. 8, 2554\u20132558.\n\n[2] D. J. Amit, H. Gutfreund, and H. Sompolinsky, \u201cSpin-glass models of neural networks,\u201d Phys.\n\nRev. A 32 (Aug, 1985) 1007\u20131018.\n\n[3] E. Gardner, \u201cThe space of interactions in neural network models,\u201d Journal of Physics A:\n\nMathematical and General 21 (1988) no. 1, 257.\n\n[4] T. V. P. Bliss and G. L. Collingridge, \u201cA synaptic model of memory: long-term potentiation in\n\nthe hippocampus,\u201d Nature 361 (Jan, 1993) 31\u201339.\n\n[5] C. C. H. Petersen, R. C. Malenka, R. A. Nicoll, and J. J. Hop\ufb01eld, \u201cAll-or-none potentiation at\n\nCA3-CA1 synapses,\u201d Proc. Natl. Acad. Sci. U.S.A. 95 (1998) no. 8, 4732\u20134737.\n\n[6] D. H. O\u2019Connor, G. M. Wittenberg, and S. S.-H. Wang, \u201cGraded bidirectional synaptic\n\nplasticity is composed of switch-like unitary events,\u201d Proc. Natl. Acad. Sci. U.S.A. 102 (2005)\nno. 27, 9679\u20139684.\n\n[7] R. Enoki, Y. ling Hu, D. Hamilton, and A. Fine, \u201cExpression of Long-Term Plasticity at\nIndividual Synapses in Hippocampus Is Graded, Bidirectional, and Mainly Presynaptic:\nOptical Quantal Analysis,\u201d Neuron 62 (2009) no. 2, 242 \u2013 253.\n\n[8] D. J. Amit and S. Fusi, \u201cConstraints on learning in dynamic synapses,\u201d Network:\n\nComputation in Neural Systems 3 (1992) no. 4, 443\u2013464.\n\n[9] D. J. Amit and S. Fusi, \u201cLearning in neural networks with material synapses,\u201d Neural\n\nComputation 6 (1994) no. 5, 957\u2013982.\n\n[10] S. Fusi, P. J. Drew, and L. F. Abbott, \u201cCascade models of synaptically stored memories,\u201d\n\nNeuron 45 (Feb, 2005) 599\u2013611.\n\n[11] S. Fusi and L. F. Abbott, \u201cLimits on the memory storage capacity of bounded synapses,\u201d Nat.\n\nNeurosci. 10 (Apr, 2007) 485\u2013493.\n\n[12] C. Leibold and R. Kempter, \u201cSparseness Constrains the Prolongation of Memory Lifetime via\n\nSynaptic Metaplasticity,\u201d Cerebral Cortex 18 (2008) no. 1, 67\u201377.\n\n[13] D. S. Bredt and R. A. Nicoll, \u201cAMPA Receptor Traf\ufb01cking at Excitatory Synapses,\u201d Neuron\n\n40 (2003) no. 2, 361 \u2013 379.\n\n[14] M. P. Coba, A. J. Pocklington, M. O. Collins, M. V. Kopanitsa, R. T. Uren, S. Swamy, M. D.\nCroning, J. S. Choudhary, and S. G. Grant, \u201cNeurotransmitters drive combinatorial multistate\npostsynaptic density networks,\u201d Sci Signal 2 (2009) no. 68, ra19.\n\n[15] W. C. Abraham and M. F. Bear, \u201cMetaplasticity: the plasticity of synaptic plasticity,\u201d Trends\n\nin Neurosciences 19 (1996) no. 4, 126 \u2013 130.\n\n[16] J. M. Montgomery and D. V. Madison, \u201cState-Dependent Heterogeneity in Synaptic\n\nDepression between Pyramidal Cell Pairs,\u201d Neuron 33 (2002) no. 5, 765 \u2013 777.\n\n[17] R. D. Emes and S. G. Grant, \u201cEvolution of Synapse Complexity and Diversity,\u201d Annual\n\nReview of Neuroscience 35 (2012) no. 1, 111\u2013131.\n\n[18] A. B. Barrett and M. C. van Rossum, \u201cOptimal learning rules for discrete synapses,\u201d PLoS\n\nComput. Biol. 4 (Nov, 2008) e1000230.\n\n[19] J. Kemeny and J. Snell, Finite markov chains. Springer, 1960.\n[20] C. Burke and M. Rosenblatt, \u201cA Markovian function of a Markov chain,\u201d The Annals of\n\nMathematical Statistics 29 (1958) no. 4, 1112\u20131122.\n\n[21] F. Ball and G. F. Yeo, \u201cLumpability and Marginalisability for Continuous-Time Markov\n\nChains,\u201d Journal of Applied Probability 30 (1993) no. 3, 518\u2013528.\n\n9\n\n\f", "award": [], "sourceid": 551, "authors": [{"given_name": "Subhaneil", "family_name": "Lahiri", "institution": "Stanford University"}, {"given_name": "Surya", "family_name": "Ganguli", "institution": "Stanford University"}]}