{"title": "A Growing Neural Gas Network Learns Topologies", "book": "Advances in Neural Information Processing Systems", "page_first": 625, "page_last": 632, "abstract": null, "full_text": "A  Growing  Neural Gas  Network Learns \n\nTopologies \n\nBernd Fritzke \n\nInstitut fur Neuroinformatik \nRuhr-Universitat Bochum \n\nD-44 780  Bochum \n\nGermany \n\nAbstract \n\nAn incremental network model is  introduced which is  able to learn \nthe important topological relations in a given set of input vectors by \nmeans of a  simple Hebb-like learning rule.  In contrast to previous \napproaches like the \"neural gas\"  method of Martinetz and Schulten \n(1991,  1994), this model has no parameters which change over time \nand is able to continue learning, adding units and connections, until \na  performance criterion has been met.  Applications of the model \ninclude vector quantization, clustering, and interpolation. \n\n1 \n\nINTRODUCTION \n\nIn  unsupervised learning settings  only  input  data is  available  but  no  information \non the desired output.  What can the goal of learning be in this situation? \n\nOne possible objective is  dimensionality  reduction:  finding  a  low-dimensional sub(cid:173)\nspace  of the  input  vector  space  containing  most  or  all  of the  input  data.  Linear \nsubspaces with this property can be computed directly by principal component anal(cid:173)\nysis or iteratively with a number of network models (Sanger, 1989;  Oja,  1982).  The \nKohonen feature map (Kohonen,  1982)  and the  \"growing cell  structures\"  (Fritzke, \n1994b)  allow  projection onto  non-linear,  discretely sampled subspaces of a  dimen(cid:173)\nsionality  which  has  to  be  chosen  a  priori.  Depending  on  the  relation  between \ninherent data dimensionality and dimensionality of the target space, some informa(cid:173)\ntion on the topological arrangement of the input data may be lost  in  the process. \n\n\f626 \n\nBernd Fritzke \n\nThis  is  not  astonishing since  a  reversible  mapping from  high-dimensional data to \nlower-dimensional spaces (or structures) does not exist in general. \n\nAsking how structures must look like to allow reversible mappings directly leads to \nanother possible objective of unsupervised learning which can be described as topol(cid:173)\nogy  learning:  Given some high-dimensional data distributionp(e), find a topological \nstructure which  closely  reflects the topology  of the data distribution.  An elegant \nmethod to construct such structures is  \"competitive Hebbian learning\" (CHL) (Mar(cid:173)\ntinetz, 1993).  CHL requires the use of some vector quantization method.  Martinetz \nand  Schulten propose the  \"neural gas\"  (NG)  method for  this purpose  (Martinetz \nand Schulten, 1991). \nWe will briefly introduce and discuss the approach of Martinetz and Schulten.  Then \nwe  propose  a  new  network  model  which  also  makes  use  of  CHL.  In  contrast  to \nthe  above-mentioned  CHL/NG  combination,  this  model  is  incremental  and  has \nonly constant parameters.  This leads to a  number of advantages over  the previous \napproach. \n\n2  COMPETITIVE HEBBIAN LEARNING  AND \n\nNEURAL  GAS \n\nCHL (Martinetz, 1993) assumes a number of centers in R n  and successively inserts \ntopological connections among them by evaluating input signals drawn from a data \ndistribution p(e).  The principle of this method is: \n\nFor  each input signal x  connect the two  closest  centers (measured \nby Euclidean distance) by an edge. \n\nThe resulting  graph  is  a  subgraph of the  Delaunay  triangulation  (fig.  1a)  corre(cid:173)\nsponding to the set of centers.  This subgraph (fig.  1b), which is called the \"induced \nDelaunay  triangulation\",  is  limited  to  those  areas  of  the  input  space  R n  where \np(e\u00bb  O.  The  \"induced  Delaunay  triangulation\"  has  been  shown  to  optimally \npreserve topology in a  very general sense (Martinetz, 1993). \n\nOnly centers lying on the input data submanifold or in its vicinity actually develop \nany edges.  The others are useless for the purpose of topology learning and are often \ncalled dead  units.  To make use of all centers they have to be placed in those regions \nof R n  where P (e)  differs from zero.  This could be done by any vector quantization \n(VQ)  procedure.  Martinetz and Schulten have  proposed a  particular kind  of VQ \nmethod,  the  mentioned  NG  method  (Martinetz  and  Schulten,  1991).  The  main \nprinciple of NG is the following: \n\nFor  each  input signal  x  adapt the k  nearest  centers whereby  k  is \ndecreasing from a large initial to a small final  value. \n\nA  large initial  value of k  causes adaptation  (movement  towards  the input  signal) \nof a  large fraction of the centers.  Then k  (the adaptation range) is decreased until \nfinally  only  the  nearest  center for  each  input  signal  is  adapted.  The  adaptation \nstrength underlies a similar decay schedule.  To realize the parameter decay one has \nto define the total number of adaptation steps for the NG method in advance. \n\n\fA Growing Neural Gas Network Learns Topologies \n\n627 \n\na)  Delaunay triangulation \n\nb) induced Delaunay triangulation \n\nFigure  1:  Two  ways  of defining  closeness  among a  set  of points.  a)  The  Delau(cid:173)\nnay  triangulation  (thick  lines)  connects  points  having  neighboring  Voronoi  poly(cid:173)\ngons  (thin lines).  Basically this  reduces  to points having small Euclidean distance \nw.r.t.  the given set of points.  b)  The  induced  Delaunay  triangulation  (thick lines) \nis  obtained by  masking  the original  Delaunay  triangulation  with  a  data distribu(cid:173)\ntion P(~) (shaded) .  Two centers are only connected if the common border of their \nVoronoi polygons lies  at least partially in a  region where P(~\u00bb  0  (closely  adapted \nfrom Martinetz and Schulten,  1994) \n\nFor  a  given  data distribution  one  could  now  first  run  the  NG  algorithm  to  dis(cid:173)\ntribute  a  certain number  of centers  and  then use  CHL  to  generate the  topology. \nIt is,  however,  also  possible to apply both techniques concurrently (Martinetz and \nSchulten, 1991).  In this case a  method for removing obsolete edges is required since \nthe motion of the centers may make edges  invalid which  have  been generated ear(cid:173)\nlier.  Martinetz and Schulten use an edge  aging scheme for this purpose.  One should \nnote that the CHL algorithm does not influence the outcome of the NG  method in \nany way since the adaptations in NG are based only on distance in input space and \nnot  on the network  topology.  On  the other hand NG  does influence the topology \ngenerated by CHL since it moves the centers around. \n\nThe combination of NG and CHL described above is  an effective method for  topol(cid:173)\nogy  learning.  A  problem in  practical applications,  however,  may  be to determine \na  priori  a  suitable  number of  centers.  Depending  on  the  complexity  of  the  data \ndistribution which  one wants  to  model,  very  different  numbers  of centers  may  be \nappropriate.  The nature of the NG  algorithm requires  a  decision  in  advance and, \nif the result is  not satisfying,  one or several new  simulations have  to be performed \nfrom  scratch.  In  the following  we  propose a  method which  overcomes  this  prob(cid:173)\nlem and  offers  a  number of other advantages  through a  flexible  scheme for  center \ninsertion. \n\n\f628 \n\nBernd Fritzke \n\n3  THE GROWING NEURAL  GAS  ALGORITHM \n\nIn the following  we  consider networks consisting of \n\n\u2022  a  set A  of units  (or nodes).  Each unit  c  E  A  has an associated reference \nvector We  E Rn.  The reference vectors can be regarded as positions in input \nspace of the corresponding units. \n\n\u2022  a  set  N  of connections  (or  edges)  among  pairs  of  units.  These  connec(cid:173)\n\ntions  are not  weighted.  Their sole purpose is  the definition of topological \nstructure. \n\nMoreover,  there is  a  (possibly infinite) number of n-dimensional input signals obey(cid:173)\ning some unknown probability density function P(~). \n\nThe main idea of the method is  to successively  add new units to an initially small \nnetwork  by  evaluating local statistical measures gathered during previous  adapta(cid:173)\ntion steps.  This is the same approach as used in the \"growing cell structures\" model \n(Fritzke,  1994b)  which,  however,  has  a  topology with a  fixed  dimensionality  (e.g., \ntwo or three). \n\nIn  the approach described here,  the  network  topology  is  generated incrementally \nby  CHL and has  a  dimensionality which  depends on the input data and may vary \nlocally.  The complete algorithm for  our model which we  call  \"growing neural gas\" \nis  given by  the following: \n\no.  Start with two units a  and b at random positions Wa  and Wb  in Rn. \n1.  Generate an input signal ~ according to P(~). \n2.  Find the nearest unit 81  and the second-nearest unit 82. \n3.  Increment the age of all edges emanating from 81. \n4.  Add the squared distance between the input signal and the nearest unit in \n\ninput space to a  local counter variable: \n\nAerror(8t} = IIWSl  - ell 2 \n\n5.  Move  81  and  its  direct  topological  neighbors1  towards  ~  by  fractions \n\nEb  and En,  respectively,  of the total distance: \n\nAWs1  =  Eb(e - W S1 ) \nAWn  =  En(~ - w n )  for all  direct neighbors n  of 81 \n\n6.  If 81  and  82  are connected by  an edge,  set  the age of this edge to zero.  If \n\nsuch an edge does not exist,  create it.2 \n\n7.  Remove edges with an age larger than a maz \u2022  If this results in points having \n\nno emanating edges, remove them as well. \n\nIThroughout  this  paper  the term  neighbors denotes  units which are topological neigh(cid:173)\n\nbors  in the graph  (as opposed to units  within a  small Euclidean distance of each other in \ninput  space). \n\n2This  step  is  Hebbian  in  its  spirit  since  correlated  activity  is  used  to  decide  upon \n\ninsertions. \n\n\fA Growing Neural Gas Network Learns Topologies \n\n629 \n\n8.  If the number of input signals  generated so far  is  an integer multiple of a \n\nparameter A,  insert a  new unit as follows: \n\n\u2022  Determine the unit q  with the maximum accumulated error. \n\u2022  Insert  a  new  unit  r  halfway  between  q  and  its  neighbor  f  with  the \n\nlargest error variable: \n\nWr  =  0.5 (wq + wf)' \n\n\u2022  Insert edges connecting the new unit r  with units q and f, and remove \n\nthe original edge between q and f. \n\n\u2022  Decrease the  error  variables  of q  and f  by  multiplying  them  with  a \nconstant 0:.  Initialize the error variable of r  with the new value of the \nerror variable of q. \n\n9.  Decrease all error variables  by multiplying them with a  constant d. \n10.  If a  stopping criterion  (e.g.,  net size  or some performance measure) is  not \n\nyet fulfilled  go to step 1. \n\nHow  does  the  described  method work?  The  adaptation steps  towards  the  input \nsignals  (5.)  lead to a  general movement of all units towards those areas of the input \nspace  where  signals  come from  (P(~) >  0).  The  insertion  of edges  (6.)  between \nthe nearest and the second-nearest unit with respect to an input signal generates a \nsingle connection of the \"induced Delaunay triangulation\"  (see fig.  1b)  with respect \nto  the  current position  of all  units. \n\nThe removal of edges (7.)  is  necessary to get rid of those edges which are no longer \npart  of  the  \"induced  Delaunay  triangulation\"  because  their  ending  points  have \nmoved.  This is  achieved by  local edge aging  (3.)  around the nearest unit combined \nwith  age  re-setting  of those  edges  (6.)  which  already  exist  between  nearest  and \nsecond-nearest units. \n\nWith  insertion and  removal  of edges  the  model tries  to  construct  and  then track \nthe  \"induced Delaunay triangulation\"  which  is  a  slowly  moving  target  due to the \nadaptation of the reference vectors. \n\nThe accumulation of squared distances (4.)  during the adaptation helps to identify \nunits  lying  in  areas  of the  input  space  where  the  mapping from  signals  to  units \ncauses much error.  To reduce this error, new units are inserted in such regions. \n\n4  SIMULATION RESULTS \n\nWe will now give some simulation results to demonstrate the general behavior of our \nmodel.  The probability distribution in fig.  2 has  been proposed by Martinetz and \nSchulten (1991) to demonstrate the non-incremental \"neural gas\"  model.  It can be \nseen that our model quickly learns the important topological relations in this rather \ncomplicated distribution by forming structures of different dimensionalities. \nThe second example (fig.  3)  illustrates the differences between the proposed model \nand the original NG network.  Although the final topology is rather similar for  both \nmodels, intermediate stages are quite different.  Both models are able to identify the \nclusters in the given distribution.  Only  the  \"growing neural gas\"  model,  however, \n\n\f630 \n\nBernd Fritzke \n\nFigure 2:  The  \"growing neural gas\"  network adapts to a  signal distribution which \nhas different  dimensionalities in different  areas of the input space.  Shown are the \ninitial network consisting of two randomly placed units and the networks after 600, \n1800,  5000,  15000  and  20000  input  signals  have  been applied.  The last  network \nshown is  not the necessarily the \"final\"  one since the growth process could in prin(cid:173)\nciple  be continued indefinitely.  The parameters for  this simulation were:  A =  100, \nEb  = 0.2,  En  = 0.006,  a  = 0.5,  amaz  = 50,  d = 0.995. \n\ncould  continue  to grow  to  discover  still  smaller clusters  (which  are not  present  in \nthis particular example,  though). \n\n5  DISCUSSION \n\nThe \"growing neural gas\"  network presented here is able to make explicit the impor(cid:173)\ntant topological relations in a given distribution pee) of input signals.  An advantage \nover  the NG  method of Martinetz and Schulten is  the incremental character of the \nmodel which eliminates the need to pre-specify a  network size.  Instead, the growth \nprocess can be continued until a user-defined performance criterion or network size \nis  met.  All  parameters are constant  over  time in contrast  to  other models  which \nheavily rely on decaying parameters (such as the NG method or the Kohonen feature \nmap). \nIt should be noted that the topology generated by CHL is  not an optional feature \n\n\fA Growing Neural Gas Network Learns Topologies \n\n631 \n\n\"neural gas\"  and \n\n\"competitive Hebbian learning\" \n\n\"growing neural gas\" \n\n(uses  \"competitive Hebbian learning\") \n\n0 \n0 0 0   0 \n\no \n\n0 \n\n00  0 \n\no~oo 000 \n\n\u00b7~~r~: ;-.; .. {]J'. \no \n\nco  00  \n\n0 \n\n~ \n\n0 \n\no \n\n00 \n\n8 \n\n0 \n\n0 90 -\no \n\nV \n\nj \n\nFigure 3:  The NG/CHL network of Martinetz and Schulten (1991) and the author's \n\"growing  neural gas\"  model  adapt  to  a  clustered probability distribution.  Shown \nare  the  respective  initial  states  (top  row)  and  a  number  of intermediate stages. \nBoth the number  of units  in  the  NG  model and the final  number  of units in  the \n\"growing  neural  gas\"  model are  100.  The  bottom row  shows  the  distribution  of \ncenters after 10000 adaptation steps (the edges are as in the previous row  but not \nshown).  The  center  distribution  is  rather  similar  for  both models  although  the \nintermediate stages differ significantly. \n\n\f632 \n\nBernd Fritzke \n\nof our  method (as  it  is  for  the NG  model)  but  an  essential  component  since  it  is \nused to direct the (completely local) adaptation as well as insertion of centers.  It is \nprobably the proper initialization of new  units by interpolation from existing  ones \nwhich makes it possible to have  only constant parameters and local adaptations. \n\nPossible applications of our model are clustering (as shown) and vector quantization. \nThe network should perform particularly well  in situations where the neighborhood \ninformation  (in  the  edges)  is  used  to  implement  interpolation  schemes  between \nneighboring units.  By using the error occuring in early phases it can be determined \nwhere to insert new units to generate a topological look-up table of different density \nand different  dimensionality in particular areas of the input data space. \n\nAnother promising direction of research is the combination with supervised learning. \nThis has been done earlier with  the  \"growing cell  structures\"  (Fritzke,  1994c)  and \nrecently also with the \"growing neural gas\"  described in this paper (Fritzke,  1994a). \nA crucial property for this kind of application is the possibility to choose an arbitrary \ninsertion criterion.  This is a feature not present, e.g., in the original \"growing neural \ngas\".  The first  results of this new supervised network model, an incremental radial \nbasis  function  network,  are  very  promising  and  we  are further  investigating  this \ncurrently. \n\nReferences \n\nFritzke, B.  (1994a).  Fast learning with incremental rbf networks.  Neural  Processing \n\nLetters,  1(1):2-5. \n\nFritzke,  B.  (1994b).  Growing cell  structures - a  self-organizing network for  unsu(cid:173)\n\npervised and supervised learning.  Neural  Networks,  7(9):1441-1460. \n\nFritzke, B.  (1994c).  Supervised learning with growing cell structures.  In Cowan, J., \nTesauro,  G.,  and Alspector, J.,  editors,  Advances  in  Neural  Information  Pro(cid:173)\ncessing  Systems  6,  pages  255-262.  Morgan Kaufmann Publishers,  San Mateo, \nCA. \n\nKohonen, T. (1982).  Self-organized formation of topologically correct feature maps. \n\nBiological  Cybernetics,  43:59-69. \n\nMartinetz, T. M.  (1993).  Competitive Hebbian learning rule forms perfectly topol(cid:173)\n\nogy  preserving  maps.  In  ICANN'93:  International  Conference  on  Artificial \nNeural  Networks,  pages 427-434, Amsterdam. Springer. \n\nMartinetz, T. M.  and Schulten, K  J.  (1991).  A  \"neural-gas\"  network learns topolo(cid:173)\n\nIn  Kohonen,  T.,  Makisara,  K,  Simula,  0.,  and  Kangas,  J.,  editors, \n\ngies. \nArtificial Neural  Networks,  pages 397-402.  North-Holland, Amsterdam. \n\nMartinetz,  T.  M.  and  Schulten,  K  J.  (1994).  Topology  representing  networks. \n\nNeural  Networks,  7(3):507-522. \n\nOja,  E.  (1982).  A  simplified  neuron  model  as  a  principal  component  analyzer. \n\nJournal  of Mathematical  Biology,  15:267-273. \n\nSanger, T. D.  (1989).  An optimality principle for unsupervised learning.  In Touret(cid:173)\n\nzky,  D.  S., editor, Advances  in Neural  Information Processing Systems 1,  pages \n11-19. Morgan Kaufmann, San Mateo, CA. \n\n\f", "award": [], "sourceid": 893, "authors": [{"given_name": "Bernd", "family_name": "Fritzke", "institution": null}]}