{"title": "A Rigorous Analysis of Linsker-type Hebbian Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 319, "page_last": 326, "abstract": null, "full_text": "A Rigorous Analysis Of \n\nLinsker-type Hebbian Learning \n\nJ. Feng \n\nMathematical Department \n\nUniversity of Rome \"La Sapienza\u00bb \nP. Ie A. Moro, 00185 Rome, Italy \n\nfeng~at.uniroma1.it \n\nH. Pan \n\nV. P. Roychowdhury \n\nSchool of Electrical Engineering \n\nPurdue University \n\nWest Lafayette, IN 47907 \nhpan~ecn.purdue.edu \n\nvwani~drum.ecn.purdue.edu \n\nAbstract \n\nWe propose a novel rigorous approach for the analysis of Linsker's \nunsupervised Hebbian learning network. The behavior of this \nmodel is determined by the underlying nonlinear dynamics which \nare parameterized by a set of parameters originating from the Heb(cid:173)\nbian rule and the arbor density of the synapses. These parameters \ndetermine the presence or absence of a specific receptive field (also \nreferred to as a 'connection pattern') as a saturated fixed point \nattractor of the model. In this paper, we perform a qualitative \nanalysis of the underlying nonlinear dynamics over the parameter \nspace, determine the effects of the system parameters on the emer(cid:173)\ngence of various receptive fields, and predict precisely within which \nparameter regime the network will have the potential to develop \na specially designated connection pattern. In particular, this ap(cid:173)\nproach exposes, for the first time, the crucial role played by the \nsynaptic density functions, and provides a complete precise picture \nof the parameter space that defines the relationships among the \ndifferent receptive fields. Our theoretical predictions are confirmed \nby numerical simulations. \n\n\f320 \n\nlian/eng Feng, H. Pan, V. P. Roychowdhury \n\n1 \n\nIntroduction \n\nFor the purpose of understanding the self-organization mechanism of primary vi(cid:173)\nsual system, Linsker has proposed a multilayered unsupervised Hebbian learning \nnetwork with random un correlated inputs and localized arborization of synapses \nbetween adjacent layers (Linsker, 1986 & 1988). His simulations have shown that \nfor appropriate parameter regimes, several structured connection patterns (e.g., \ncentre-surround and oriented afferent receptive fields (aRFs)) occur progressively \nas the Hebbian evolution of the weights is carried out layer by layer. The behavior \nof Linsker's model is determined by the underlying nonlinear dynamics which are \nparameterized by a set of parameters originating from the Hebbian rule and the \narbor density of the synapses. For a nonlinear system, usually, there coexist several \nattractors for the same set of system parameters. That is, for a given set of the \nparameters, the state space comprises several attractive basins, each corresponding \nto a steady state respectively. The initial condition determines which attractor will \nbe eventually reached. At the same time, a nonlinear system could have a different \ngroup of coexisting attractors for a different set of system parameters. That is, \none could make the presence or absence of a specific state as a fixed point attrac(cid:173)\ntor by varying the set of the parameters. For a development model like Linsker's \nnetwork, what is expected to be observed is that the different aRFs could emerge \nunder different sets of parameters but should be relatively not sensitive to the initial \nconditions. In other words, the dynamics should avoid the coexistence of several \nattractors in an appropriate way. The purpose of this paper is to gain more insights \ninto the dynamical mechanism of this self-organization model by performing a rig(cid:173)\norous analysis on its parameter space without any approximation. That is, our goal \nis to reveal the effects of the system parameters on the stability of aRFs, and to \npredict precisely within which parameter regime the network will have the poten(cid:173)\ntial to develop a specially designated aRF. The novel rigorous approach presented \nhere applies not only to the Linsker-type Hebbian learning but also to other related \nself-organization models about neural development. \nIn Linsker's network, each cell in the present layer M receives synaptic inputs from \na number of cells in the preceding layer C. The density of these synaptic connections \ndecreases monotonically with distance rC, from the point underlying the M-cell's \nposition. Since the synaptic weights change on a long time scale compared to the \nvariation of random inputs, by averaging the Hebb rule over the ensemble of inputs \nin layer C, the dynamical equation for the development of the synaptic strength \nwT(i) between a M-cell and i-th C-cell at time Tis \n\nWT+1 (i) = f{wT(i) + k1 + l:)Qj + k2]r(j)wT(j)} \n\nNc. \n\n(1) \n\nj=l \n\nwhere k1' k2 are system parameters which are particular combinations of the con(cid:173)\nstants of the Hebb rule, r(\u00b7) is a non-negative normalized synaptic density function \n(SDF) 1, and L:iEC, rei) = 1, and 10 is a limiter function defined by I(x) = Wmax , \nif x > W max ; = x, if I x I ~ W max ; and = -Wmax , if x < -wmax \u2022 The covariance \nIThe SDF is explicitly incorporated into the dynamics (1) which is equivalent to \nLinsker's formulation. A rigorous explanation for this equivalence is given in MacKay \n& Miller, 1990. \n\n\fA Rigorous Analysis of Linsker-type Hebbian Learning \n\n321 \n\nmatrix {Qij} of the layer C describes the correlation of activities of the i-th and the \nj-th C-cells. Actually, the covariance matrix of each layer is determined by SDFs \nr(\u00b7) of all layers preceding the layer under consideration. \n\nThe idea of this paper is the following. It is well known that in general it is in(cid:173)\ntractable to characterize the behavior of a nonlinear dynamics, since the nonlinearity \nis the cause of the coexistence of many attractors. And one has the difficulty in \nobtaining the complete characteristics of attractive basins in the state space. But \nusually for some cases, it is relatively easy to derive a necessary and sufficient con(cid:173)\ndition to check whether a given state is a fixed point of the dynamics. In terms of \nthis condition, the whole parameter regime for the emergence of a fixed point of the \ndynamics may be obtained in the parameter space. If we are further able to prove \nthe stability of the fixed point, which implies that this fixed point is a steady state \nif the initial condition is in a non empty vicinity in the state space, we can assert \nthe occurrence of this fixed point attractor in that parameter regime. For Linsker's \nnetwork, fortunately, the above idea can be carried out because of the specific form \nof the nonlinear function 1(\u00b7). Due to space limitations, the rigorous proofs are in \n(Feng, Pan, & Roychowdhury, 1995). \n\n2 The Set Of Saturated Fixed Point Attractors And The \n\nCriterion For The Division Of Parameter Regimes \n\nIn fact, Linsker's model is a system of first-order nonlinear difference equations, \ntaking the form \n\nWr = {wr(j),j = 1, .. . , Nc}, \n\nwr+l(i) = j[wr(i) + hi(Wr' kl' k2)], \n\n(2) \nwhere hi(Wr , kl' k2) = kl + 2:f~dQ~ + k2]r(j)wr(j). And the aRFs observed in \nLinsker's simulation are the saturated fixed point attractors of this nonlinear system \n(2). Since the limiter function 1(\u00b7) is defined on a hypercube n = [-wmax , wmax]N C \nin weight state space within which the dynamics is dominated by the linear system \nwr+l(i) = wr(i) + hi(Wr, kl' k2 ) , the short-time behaviors of evolution dynamics of \nconnection patterns can be fully characterized in terms of the properties of eigen(cid:173)\nvectors and their eigenvalues. But this method of stability analysis will not be \nsuitable for the long-time evolution of equation (1) or (2), provided the hypercube \nconstraint is reached as the first largest component of W reaches saturation. How(cid:173)\never, it is well-known that a fixed point or an equilibrium state of dynamics (2) \nsatisfies \n\n(3) \nBecause of the special form of the nonlinear function 1(\u00b7), the fixed point equation \n(3) implies that 3T, such that for T > T, \n\nI wr(i) + hi(Wr, kl' k2) I ~ Wmax , \n\nif hi (w, kl , k2) i= o. So a saturated fixed point Wr (i) must have the same sign as \nhi(wr , kl' k 2 ), i.e. \n\nwr(i)hi(Wr, kl' k 2 ) > o. \n\nBy using the above idea, our Theorems 1 & 2 (proven in Feng, Pan, & Roychowd(cid:173)\nhury, 1995) state that the set of saturated fixed point attractors of the dynamics in \n\n\f322 \n\nlian/eng Feng, H. Pan, V. P. Roychowdhury \n\nequation (1) is given by \n\nnFP = {w I w(i)hi(Wr ,kl,k2) > 0,1::; i::; N.d, \n\nand w E nFP is stable, where the weight vector w belongs to the set of all extreme \npoints of the hypercube n (we assume W max = 1 without loss of generality) . \nWe next derive an explicit necessary and sufficient condition for the emergence of \nstructured aRFs, i.e., we derive conditions to determine whether a given w belongs \nto nFP. Define J+(w) = {i I wei) = 1} as the index set of cells at the preceding layer \nC with excitatory weight for a connection pattern w, and J-(w) = {i I wei) = -1} \nas the index set of C-cells with inhibitory weight for w. Note from the property of \nfixed point attractors that a connection pattern w is an attractor of the dynamics \n(1) if and only if for i E J+(w), we have \nw(i){k1 + l)Q~ + k2]r(j)w(j)} = \n\nj \n\nw(i){k1 + EjEJ+(w)[Qt + k2]r(j)w(j) + EjEJ-(w)[Qt + k2]r(j)w(j)} > O. \n\nBy the definition of J+(w) and J-(w), we deduce from the above inequality that \n\nkl + 2: [Q~ + k2]r(j) - 2: [Q~ + k2]r(j) > 0 \n\njEJ+(w) \n\njEJ-(w) \n\nnamely \n\nkl + k2[ 2: r(j) - 2: r(j)] > 2: Q~r(j) - 2: Q~r(j). \n\njEJ+(w) \n\njEJ-(w) \n\njEJ-(w) \n\njEJ+(w) \n\nInequality above is satisfied for all i in J+(w), and the left hand is independent of \ni. Hence, \n\nkl + k2[ 2: r(j) - 2: r(j)] > . max [ I: Q~r(j) -\n\njEJ+(w) \n\njEJ-(w) \n\n\\EJ+(w) \n\njEJ-(w) \n\nI: Q~r(j)]. \njEJ+(w) \n\nOn the other hand, for i E J-(w), we can similarly deduce that \nkl + k2[ 2: r(j) - 2: r(j)] < . min [ 2: Q~r(j) - 2: Q~r(j)] . \n\njEJ+(w) \n\njEJ-(w) \n\nIEJ-(w) \n\njEJ-(w) \n\njEJ+(w) \n\nWe introduce the slope function: \n\nc( w) ~f 2: r(j) - 2: r(j) \n\njEJ+(w) \n\njEJ-(w) \n\nwhich is the difference of sums of the SDF r( \u00b7) over J+(w) and J-(w), and two \nkl -intercept functions: \nd1(w) ~ { maxtEJ+(w)(EjEJ-(w) Qf;r(j) - EjEJ+(w) Qtr(j)), \n\nif J+(w) # 0 \nif J+(w) = 0 \n\n-00, \n\n\fA Rigorous Analysis of Linsker-type Hehbian Learning \n\n323 \n\nA \n~---k , \n\n(a) \n\no \n\n(b) \n\nE o \n\nFigure 1: The parameter subspace of (kl, k2 ). (a) Parameter regime of (kl' k 2 ) \nto ensure the emergence of all-excitatory (regime A) and all-inhibitory (regime B) \nconnection patterns. The dark grey regime C is the coexistence regime for both \nall-excitatory and all-inhibitory connection patterns. And the regime D without \ntexture are the regime that Linsker's simulation results are based on, in which both \nall-excitatory and all-inhibitory connection patterns are no longer an attractor. (b) \nThe principal parameter regimes. \n\nNow from our Theorem 3 in Feng, Pan, & Roychowdhury, 1995, for every layer of \nLinsker's network, the new rigorous criterion for the division of stable parameter \nregimes to ensure the development of various structured connection patterns is \n\nd2(w) > ki + c(w)k2 > dl(w). \n\nThat is, for a given SDF r(-), the parameter regime of (kl' k2) to ensure that w is a \nstable attractor of dynamics (1) is a band between two parallel lines ki + c(w)k2 > \ndl(w) and ki +c(w)k2 < d2(w) (See regimes E and F in Fig.1(b)). It is noticed that \nas dl(w) > d2 (w), there is no regime of (kl' k2) for the occurrence of that aRF w as \nan attractor of equation (1). Therefore, the existence of such a structured aRF w as \nan attractor of equation (1) is determined by k1-intercept functions dIe) and d2 (\u00b7), \nand therefore by the covariance matrix Q.c or SDFs r(\u00b7) of all preceding layers. \n\n3 Parameter Regimes For aRFs Between Layers BAnd C \n\nBased on our general theorems applicable to all layers, we mainly focus on describing \nthe stabilization process of synaptic development from the 2nd (B) to the 3rd layer \n(C) by considering the effect of the system parameters on the weight development. \nFor the sake of convenience, we assume that the input at 1st layer (A) is independent \nnormal distribution with mean 0 and variance 1, and the connection strengths from \nlayer A to B are all-excitatory same as in Linsker's simulations. The emergence of \nvarious aRFs between layer Band C have been previously studied in the literature, \nand in this paper we mention only the following new results made possible by our \napproach: \n\n(1) For the cell in layer C, the all-excitatory and the all-inhibitory connection \npatterns still have the largest stable regimes. Denote both SDFs from layer A to B \nand from B to C as rAB ( .,. ) and r BC (-) respectively. The parameter plane of (kl' k2) \n\n\f324 \n\nlianfeng F eng, H. Pan, V. P. Roychowdhury \n\nTable 1: The Principal Parameter Regimes \n\nATTRACTOR \nAll-excitatory aRF \n\nAll-inhibitory aRF \n\nAll-excitatory and all-inhibitory \naRFs coexist \nThe structured aRFs may have \nseparate parameter regimes \nAny connection pattern in which \nthe excitatory connections \nconstitute the majority \nAny connection pattern in which \nthe inhibitory connections \nconstitute the majority \nA small coexistence regime of \nmany connection patterns around \nthe origin point of the parameter \nplane of ( kl , k2) \n\nTYPE \nRegime A \n\nRegime B \n\nRegime \n=AnB \n\nRegime F \n\nd2{w) > kl + C{W)k2 > d1(w) \nwhere \n\nc(w) < 0 \n\nRegime G \n=EnFnAnB \n\nd2(W 1) > kl + c(w1)k2 > d1 (wI) \n\nis divided into four regimes by \n\nfor all-excitatory pattern and \n\nfor all-inhibitory pattern (See Fig.I(a\u00bb. \n(2) The parameter with large and negative k2 and approximately -1 < -kdk2 < 1 \nis favorable for the emergence of various structured connection patterns (e.g., ON(cid:173)\ncenter cells, OFF-center cells, bi-Iobed cells, and oriented cells) . This is because \nthis regime (See regime D in Fig.I) is removed from the parameter regime where \nboth all-excitatory and all-inhibitory aRFs are dominant, including the coexistence \nregime of many kind of at tractors around the origin point (See regime G in Fig.I(b \u00bb. \nThe above results provide a precise picture about the principal parameter regimes \nsummarized in Table 1. \n(3) The relative size of the radiuses of two SDFs r AS (-,.) and r Sc (-) plays a key role \nin the evolution of various structured aRFs from B to C. A given SDF r.cM (i, j), i E \nM, j E e will be said to have a range r M if r.cM (i, j) is 'sufficient small' for lIi-jll ~ \nrM. For a Gaussian SDF r.cM(j,k) '\" exp(-lIj-kll/r~), j E e,k EM, the range \nr M is its standard deviation. We give the analytic prediction about the influence of \nthe SDF's ranges rs, rc on the dynamics by changing rs from the smallest extreme \nto the largest one with respect to rc. For the smallest extreme of rs (i.e. \nthe \n\n\fA Rigorous Analysis of Linsker-type Hebbian Learning \n\n325 \n\nsynaptic connections from A to B are concentrated enough, and those from layer B \nto C are fully feedforward connected), we proved that any kind of connection pattern \nhas a stable parameter regime and emerge under certain parameters, because each \nsynaptic connection within an aRF is developed independently. As rB is changed \nfrom the smallest to the largest extreme, the development of synaptic connections \nbetween layer Band C will depend on each other stronger and stronger in the \nsense that most of connections have the same sign as their neighbors in an aRF. \nthe weights from layer A to B are fully \nSo for the largest extreme of rB (i.e. \nfeedforward but there is no constraint on the SDF rBC(.)), any structured aRFs \nexcept for the all-excitatory and the all-inhibitory connection patterns will never \narise at all, although there exist correlation in input activities (for a proof see Feng, \nPan, & Roychowdhury, 1995). Th_erefore, without localized SDF, there would be \nno structured covariance matrix Q = {[Qij + k2]r(j)} which embodies localized \ncorrelation in afferent activities. And without structured covariance matrix Q, no \nstructured aRFs would emerge. \n\n(4) As another application of our analyses, we present several numerical results \non the parameter regimes of (kl' k2' rB, rc) for the formation of various structured \naRFs (Feng & Pan, 1993; Feng, Pan, & Roychowdhury, 1995) (where we assume that \nrAB(i,j) \"\" exp(-lli-jll/r~), i E B,j E A, and rBC(i) \"\" exp(-llill/r2), i E B as in \n(Linsker, 1986 & 1988)). For example, we show that various aRFs as at tractors have \ndifferent relative stability. For a fixed rc, the SDF's range rB of the preceding layer \nas the third system parameter has various critical values for different attractors. \nThat is, an attractor will no longer be stable if rB exceeds its corresponding critical \nvalue (See Fig. 2). For circularly symmetric ON-center cells, those aRFs with large \nON-center core (which have positive or small negative slope value c(w) ~ -kt/k2 ) \nalways have a stable parameter regime. But for those ON-center cells with large \nnegative slope value c(w), their stable parameter regimes decrease in size with c(w). \nSimilarly, circularly symmetric OFF-center cells with large OFF-center core (which \nhave negative or small positive slope value c( w)) will be more stable than those with \nlarge positive average of weights. But for non-circularly-symmetric patterns (e.g., \nbi-Iobed cells and oriented cells), only those at tractors with zero average synaptic \nstrength might always have a stable parameter regime (See regime H in Fig.1(b)). \nIf the third parameter rB is large enough to exceed its critical values for other aRFs \nand k2 is large and negative, then ON-center aRFs with positive c(w) and OFF(cid:173)\ncenter aRFs with negative c(w) will be almost only at tractors in regime DnE and \nregime DnF respectively. This conclusion makes it clear why we usually obtain \nON-center aRFs in regime DnE and OFF-center aRFs in regime DnF much more \neasily than other patterns. \n\n4 Concluding Remarks \n\nOne advantage of our rigorous approach to this kind of unsupervised Hebbian learn(cid:173)\ning network is that, without approximation, it unifies the treatment of many diverse \nproblems about dynamical mechanisms. It is important to notice that there is no \nassumption on the second item hi(w T ) on the right hand side of equation (1), and \nthere is no restriction on the matrix Q. Our Theorems 1 and 2 provide the gen(cid:173)\neral framework for the description of the fixed point attractors for any difference \nequation of the type stated in (2) that uses a limiter function. Depending on the \n\n\f326 \n\nJianfeng Feng, H. Pan. V. P. Roychowdhury \n\n~ __ -O-N-C-e-\"-te-r-a--,R_FS~~ ____ .(r c= 10) \n\n~~ __ -O-rie-\"-ted---,a.R.-Fs------._,(r c= 10) \n\n-.I~~~~~~~~~\u00b7~' \n\n000@@@@ \u2022\u2022\u2022 \n~ __ .O .... FF.-c .... 8n ___ ter ___ a-R-Fs---__ (r c= 10) \n\n-.I~~~~~~~~~\u00b7~ \n\n000~~~~ \u2022\u2022\u2022 \n~ __ -B-I-I-obed--a-R-F .. s ..--___ (rc= 10) \n\n-.I~~~~~~~~~\u00b7~' \n\nooooo~a \u2022\u2022\u2022 \n\nFigure 2: The critical values of the SDF's range ra for different connection patterns. \n\nstructure of the second item, hi(w T ), it is not difficult to adapt our Theorem 3 to \nobtain the precise relationship among system parameters in other kind of models \nas long as 10 is a limiter function. Since the functions in the necessary and suf(cid:173)\nficient condition are computable (like our slope and k1-intercept functions), one is \nalways able to check whether a designated fixed point is stable for a specific set of \nparameters. \n\nAcknowledgements \n\nThe work ofV. P. Roychowdhury and H. Pan was supported in part by the General \nMotors Faculty Fellowship and by the NSF Grant No. ECS-9308814. J. Feng \nwas partially supported by Chinese National Key Project of Fundamental Research \n\"Climbing Program\" and CNR of Italy. \n\nReferences \n\n(1988) Self-organization in a perceptual network. Computer 21(3): \n\nR. Linsker. (1986) From basic network principle to neural architecture (series). \nProc. Natl. Acad. Sci. USA 83: 7508-7512,8390-8394,8779-8783. \nR. Linsker. \n105-117. \nD. MacKay, & K. Miller. (1990) Analysis of Linsker's application of Hebbian rules \nto linear networks. Network 1: 257-297. \nJ. Feng, & H. Pan. (1993) Analysis of Linsker-type Hebbian learning: Rigorous \nresults. Proc. 1993 IEEE Int. Con! on Neural Networks - San Francisco Vol. III, \n1516-1521. Piscataway, NJ: IEEE. \nJ. Feng, H. Pan, & V. P. Roy chowdhury. (1995) Linsker-type Hebbian learning: A \nqualitative analysis on the parameter space. (submitted). \n\n\f", "award": [], "sourceid": 914, "authors": [{"given_name": "J.", "family_name": "Feng", "institution": null}, {"given_name": "H.", "family_name": "Pan", "institution": null}, {"given_name": "V. P.", "family_name": "Roychowdhury", "institution": null}]}