{"title": "Kohonen Feature Maps and Growing Cell Structures - a Performance Comparison", "book": "Advances in Neural Information Processing Systems", "page_first": 123, "page_last": 130, "abstract": null, "full_text": "Kohonen Feature  Maps and  Growing \n\nCell Structures -\n\na  Performance Comparison \n\nBernd Fritzke \n\nInternational Computer Science  Institute \n\n1947 Center  Street,  Suite 600 \nBerkeley,  CA  94704-1105,  USA \n\nAbstract \n\nA performance comparison of two self-organizing networks,  the Ko(cid:173)\nhonen  Feature Map  and the recently  proposed Growing Cell Struc(cid:173)\ntures  is  made.  For  this  purpose  several  performance  criteria  for \nself-organizing  networks  are  proposed  and  motivated.  The models \nare tested with three example problems of increasing difficulty.  The \nKohonen  Feature  Map  demonstrates slightly superior  results  only \nfor  the simplest problem.  For the other more difficult and also more \nrealistic problems the Growing Cell Structures exhibit significantly \nbetter  performance  by  every  criterion .  Additional  advantages  of \nthe new  model are  that  all  parameters are constant  over  time and \nthat size  as  well  as  structure  of the  network  are  determined  auto(cid:173)\nmatically. \n\n1 \n\nINTRODUCTION \n\nSelf-organizing networks  are  able to generate interesting low-dimensional represen(cid:173)\ntations  of high-dimensional  input  data.  The  most  well-known  of these  models  is \nthe  Kohonen  Feature  Map  (Kohonen  [1982)) .  So far  it  has  been  applied  to a  large \nvariety  of problems including  vector  quantization  (Schweizer  et  al.  [1991)),  biolog(cid:173)\nical  modelling  (Obermayer,  Ritter  &  Schulten  [1990)),  combinatorial optimization \n(Favata  &  Walker  [1991])  and  also  processing  of symbolic  information(Ritter  & \nKohonen  [1989)) . \n\n123 \n\n\f124 \n\nFritzke \n\nIt has been reported by a number of researchers,  that one disadvantage of Kohonen's \nmodel is  the fact,  that the network structure had to be specified  in advance.  This is \ngenerally not possible in an optimal way since  a necessary  piece  of information, the \nprobability distribution of the input signals,  is  usually not  available.  The choice  of \nan unsuitable network structure,  however,  can badly degrade network  performance. \n\nRecently we  have proposed  a new self-organizing network model - the Growing Cell \nStructures - which  can  automatically determine  a  problem specific  network  struc(cid:173)\nture (Fritzke  [1992]).  By  now  the model has been successfully  applied to clustering \n(Fritzke  [1991])  and  combinatorial optimization (Fritzke  &  Wilke  [1991]). \nIn  this  contribution  we  directly  compare  our  model to  that  of Kohonen.  We  first \nreview some general properties of self-organizing networks and several performance \ncriteria  for  these  networks  are  proposed  and  motivated.  The  new  model  is  then \nbriefly  described.  Simulation results  are  presented  and allow  a  comparison of both \nmodels with respect  to the proposed criteria. \n\n2  SELF-ORGANIZING  NETWORKS \n\n2.1  CHARACTERlSTICS \n\nA  self-organizing  network  consists  of a  set  of neurons  arranged  in  some  topolog(cid:173)\nical  structure  which  induces  neighborhood  relations  among  the  neurons.  An  n(cid:173)\ndimensional  reference  vector is  attached  to  every  neuron.  This  vector  determines \nthe specific  n-dimensional input signal to which  the neuron  is  maximally sensitive. \n\nBy  assigning  to  every  input  signal  the  neuron  with  the  nearest  reference  vector \n(according to a  suitable norm), a  mapping is  defined  from  the space  of all  possible \ninput signals onto the neural structure.  A given set of reference  vectors  thus divides \nthe input vector  space  into regions with  a  common nearest  reference  vector.  These \nregions  are  commonly denoted  as  Voronoi  regions  and  the  corresponding  partition \nof the  input vector  space  is  denoted  Voronoi  partition. \n\nSelf-organizing  networks  learn  (change  internal  parameters)  in  an  unsupervised \nmanner  from  a  stream  of input  signals.  These  input  signals  obey  a  generally  un(cid:173)\nknown  probability distribution.  For  each  input signal  the  neuron  with  the  nearest \nreference  vector  is  determined,  the so-called  \"best  matching unit\"  (bmu).  The ref(cid:173)\nerence  vectors  of the  bmu  and of a  number of its  topological neighbors  are  moved \ntowards  the  input  signal.  The  adaptation  of topological  neighbors  distinguishes \nself-organization  (\"winner  take  most\")  from  competitive  learning  where  only  the \nbmu is  adapted  (\"winner  take all\"). \n\n2.2  PERFORMANCE  CRlTERlA \n\nOne  can  identify  three  main criteria for  self-organizing  networks.  The importance \nof each  criterion may vary from application to  application. \n\nTopology Preservation.  This denotes  two  properties  of the mapping defined  by \nthe network.  We  call  the mapping topology-preserving if \n\n\fKohonen Feature Maps and Growing Cell  Structures-a Performance  Comparison \n\n125 \n\na)  similar input vectors are mapped onto identical or closely  neighboring neu(cid:173)\n\nrons  and \n\nb)  neighboring neurons have similar reference  vectors. \n\nProperty  a)  ensures,  that small changes  of the  input  vector  cause  correspondingly \nsmall changes in the position of the bmu.  The mapping is  robust against distortions \nof the input , a very important property for  applications dealing with real , noisy data. \nProperty  b)  ensures  robustness  of the inverse  mapping .  The topology  preservation \nis  especially interesting  when  the  dimension of the input  vectors is  higher  than the \nnetwork  dimension.  Then  the  mapping  reduces  the  data  dimension  but  usually \npreserves  important similarity relations  among the input data. \n\nModelling of Probability Distribution.  A  set  of  reference  vectors  is  said  to \nmodel the probability  distribution, ifthe local density of reference vectors in the input \nvector  space approaches the  probability density  of the input vector  distribution . \n\nThis  property  is  desirable  for  two  reasons.  First,  we  get  an  implicit model  of the \nunknown probability distribution underlying the input signals.  Second,  the network \nbecomes fault-tolerant against  damage, since every  neuron  is  only  \"responsible\"  for \na  small fraction  of all  input  vectors .  If neurons  are  destroyed  for  some  reason  the \nmapping ability of the network  degrades  only proportionally to the number of the \ndestroyed neurons  (soft fail) . This is  a very  desirable property for  technical (as well \nas  natural) systems. \nMinimization of Quantization Error.  The  quantization  error for  a  given input \nsignal  is  the distance  between  this signal  and  the  reference  vector  of the  bmu .  We \ncall  a  set  of reference  vectors  error  minimizing for  a  given  probability  distribution \nif the mean quantization error  is  minimized. \n\nThis  property  is  important,  if the  original  signals  have  to  be  reconstructed  from \nthe reference  vectors which  is  a very  common situation in vector  quantization.  The \nquantization error  in this case  limits the  accuracy  of the  reconstruction . \n\nOne  should  note  that  the  optimal distribution  of reference  vectors  for  error  mini(cid:173)\nmization is  generally  different  from  the  optimal distribution  for  distribution  mod(cid:173)\nelling. \n\n3  THE GROWING CELL STRUCTURES \n\nThe  Growing  Cell  Structures  are  a  self-organizing  network  an  important feature \nof which  is  the  ability  to  automatically find  a  problem  specific  network  structure \nthrough  a  growth process. \nBasic building blocks are k-dimensional hypertetrahedrons: lines for  k  =  1,  triangles \nfor  k = 2, tetrahedrons for  k = 3 etc.  The vertices of the hypertetrahedrons  are the \nneurons  and  the edges  denote  neighborhood  relations. \n\nBy insertion and deletion of neurons the structure is modified.  This is  done during a \nself-organization process  which is similar to that in Kohonen 's model.  Input signals \ncause adaptation of the bmu and its topological neighbors.  In contrast to Kohonen's \nmodel all  parameters are constant including the width of the neighborhood  around \n\n\f126 \n\nFritzke \n\nthe  bmu  where  adaptation  takes  place.  Only  direct  neighbors  and  the  bmu  itself \nare  being  adapted. \n\n3.1 \n\nINSERTION  OF  NEURONS \n\nTo determine  the positions where  new  neurons should  be inserted  the  concept  of a \nresource is introduced.  Every neuron has a local resource  variable and new neurons \nare  always inserted  near  the  neuron  with  the highest  resource  value.  New  neurons \nget  part  of the  resource  of their  neighbors  so  that  in  the  long  run  the  resource  is \ndistributed  evenly  among all  neurons. \n\nEvery  input signal causes  an increase  of the resource  variable of the best  matching \nunit.  Choices for  the resource  examined so far  are \n\n\u2022  the summed quantization error  caused  by  the neuron \n\u2022  the number of input signals received  by  the neuron \n\nAlways  after  a  constant  number  of  adaptation  steps  (e.g.  100)  a  new  neuron  is \ninserted.  For  this  purpose  the  neuron  with  the  highest  resource  is  determined  and \nthe  edge  connecting  it  to  the  neighbor  with  the  most  different  reference  vector  is \n\"split\"  by inserting the new  neuron.  Further edges  are added to rebuild  a structure \nconsisting  only of k-dimensional hypertetrahedrons. \n\nThe  reference  vector  of the  new  neuron  is  interpolated  from  the  reference  vectors \nbelonging to  the ending  points of the  split edge.  The resource  variable of the  new \nneuron is  initialized by subtracting some resource from its neighbors)  the amount of \nwhich is  determined by the reduction of their Voronoi regions through the insertion. \n\n3.2  DELETION  OF  NEURONS \n\nBy  comparing the  fraction  of all  input signals which  a specific  neuron  has  received \nand the volume of its Voronoi region one can derive a local estimate of the  probability \ndensity of the  input  vectors. \n\nThose  neurons)  whose  reference  vectors  fall  into regions  of the  input  vector  space \nwith a very low probability density)  are regarded  as  \"superfluous))  and are removed. \nThe result  are  problem-specific network structures  potentially consisting  of several \nseparate sub  networks  and  accurately modelling a  given  probability distribution. \n\n4  SIMULATION  RESULTS \n\nA  number  of  tests  have  been  performed  to  evaluate  the  performance  of the  new \nmodel.  One series  is  described  in the following. \n\nThree methods have  been  compared. \n\na)  Kohonen  Feature  Maps  (KFM) \nb)  Growing Cell  Structures  with  quantization error  as  resource  (GCS-l) \nc)  Growing Cell Structures with number of input signals as  resource  (GCS-2) \n\n\fKohonen Feature Maps and  Growing Cell Structures-a Performance  Comparison \n\n127 \n\n[J \n\n[J \n\no \n\n[J \n\nc \n\nc \n\n[J \n\n[J \n\nDistribution A: \nThe  probability  density \nis  uniform  in  the  unit \nsquare \n\nDistribution B: \nThe  probability  density \nis  uniform  in  the  lOx \n10-field,  by  a  factor  100 \nhigher  in  the  1  x  I-field \nand zero elsewhere \n\ninside \n\nDistribution C: \nThe  probability  density \nis  uniform \nthe \nseven  lower  squares,  by \na  factor  10  higher  in  the \ntwo  upper  squares  and \nzero  elsewhere. \n\nFigure  1:  Three  different  probability distributions used  for  a  performance compar(cid:173)\nison.  Distribution A  is  very  simple  and has  a  form  ideally suited for  the  Kohonen \nFeature  Map  which  uses  a  square  grid  of neurons.  Distribution  B  was  chosen  to \nshow  the effects  of a  highly varying probability density.  Distribution C  is  the most \nrealistic with  a  number of separate  regions some of which  have  also different  prob(cid:173)\nability densities. \n\nThese models were  applied to the probability distributions shown in fig.  1.  The Ko(cid:173)\nhonen model was used  with a  10 x 10-grid of neurons.  The Growing Cell Structures \nwere  used  to  build up  a  two  dimensional cell  structure of the same size.  This was \nachieved  by stopping the growth process  when  the number of neurons  had reached \n100. \n\nAt the end of the simulation the proposed  criteria were  measured  as follows: \n\n\u2022  The  topology  preservation  requires  two  properties.  Property  a)  was  mea(cid:173)\n\nsured  by  the  topographical  product  recently  proposed  by  Bauer  e.a.  for  this \npurpose  (Bauer  &  Pawelzik  [1992]).  Property  b)  was  measured  by  com(cid:173)\nputing  the  mean  edge  length  in  the  input  space,  i.e.  the  mean  difference \nbetween  reference  vectors  of directly  neighboring neurons. \n\n\u2022  The  distribution  modelling  was  measured  by  generating  5000  test  signals \naccording  to  the  specific  probability  distribution  and  counting  for  every \nneuron  the  number  of  test  signals  it  has  been  bmu  for.  The  standard \ndeviation  of  all  counter  values  was  computed  and  divided  by  the  mean \nvalue  of the  counters  to get  a  normalized measure,  the  distribution  error, \nfor  the modelling of the probability distribution. \n\n\u2022  The error minimization was measured by computing the mean  square  quan(cid:173)\n\ntization  error of the test signals. \n\nThe  numerical  results  of the  simulations are  shown  in  fig.  2.  Typical  examples of \nthe final  network structures  can be seen  in fig.  3.  It can be seen  from fig.  2 that the \n\n\f128 \n\nFritzke \n\nA \n\nmodel \nKFM \nGCS-1 \nGCS-2 \n\nB \n\n0.022 \n0.014 \n10.0111 \n\nC \n\n0.048 \n0.044 \n10.019 1 \n\na)  topographical product \nB \nC \n0.84 \n0.90 \n\u00a7I]  10.591 \n0.73 \n1.57 \n\nmodel \nKFM \nGCS-1 \nGCS-2 \n\nA \n\n0.26 \n\nc)  distribution error \n\nmodel \nKFM \nGCS-1 \nGCS-2 \n\nA \n\nB \n\nC \n\n0.092 \n10.056 1 \n0.071 \n\n0.110 \n0.015 \n10.0131 \n\n0.11 \n0.11 \n\nb)  mean edge  length \nA \n\nB \n\nC \n\n0.00077 \n0.00089 \n10.000551  10.000041 \n\n0.0020 \n0.0019 \n0.0019 \nd)  quantization error \n\n0.00086 \n0.00010 \n\nmodel \nKFM \nGCS-1 \nGCS-2 \n\nFigure  2:  Simulation results  of the  performance  comparison.  The model  of Koho(cid:173)\nnen(KFM)  and  two  versions  of the  Growing  Cell  Structures  have  been  compared \nwith respect  to different  criteria.  All  criteria are such,  that smaller values are better \nvalues.  The  best  (smallest)  value in  each  column is  enclosed  in  a  box.  Simulations \nwere  performed with  the probability distributions A,  Band C  from fig.  1. \n\nmodel of Kohonen  has superior values only for  distribution A,  which is very  regular \nand  formed  exactly  like  the  chosen  network  structure  (a square).  Since  generally \nthe probability distribution is unknown and irregular, the distributions Band C  are \nby far  more realistic.  For  these  distributions the Growing Cell Structures  have the \nbest  values. \n\nThe  modelling of the  distribution  and  the  minimization of the  quantization error \nare  generally  concurring  objectives.  One  has  to  decide  which  objective  is  more \nimportant for  the current application.  Then the appropriate version  ofthe Growing \nCell  Structures  can  optimize with  respect  to  that  objective.  For  the  complicated \ndistribution C,  however,  either version  of the Growing Cell  Structures performs for \nevery  criterion  better  than  Kohonen's model. \n\nEspecially  notable  is  the  low  quantization  error  for  distribution  C  and  the  error \nminimizing  version  (GCS-2)  of  the  Growing  Cell  Structures  (see  fig.  2d).  This \nvalue indicates a  good  potential for  vector  quantization. \n\n5  DISCUSSION \n\nOur  investigations  indicate  that  - w.r.t  the  proposed  criteria  - the  Growing  Cell \nStructures  are superior to Kohonen's model for  all  but very  carefully chosen  trivial \nexamples.  Although we  used small examples for  the sake of clarity, our experiments \nlead us to conjecture,  that the difference  will further  increase with the difficulty and \nsize  of the problem. \n\nThere are some other important advantages of our  approach.  First,  all  parameters \nare  constant.  This  eliminates  the  difficult  choice  of  a  \"cooling  schedule\"  which \nis  necessary  in  Kohonen's  model.  Second,  the  network  size  does  not  have  to  be \nspecified  in advance.  Instead the growth process can be continued until an arbitrary \nperformance  criterion  is  met.  To  meet  a  specific  criterion  with  Kohonen's  model, \none  generally  has  to  try  different  network sizes.  To start  always  with  a  very  large \n\n\fKohonen Feature Maps and  Growing  Cell  Structures-a Performance  Comparison \n\n129 \n\nDistribution A \n\nDistribution B \n\nDistribution C \n\n-I--\n\n-f-\nv -\n\n-\" \n\na) \n\nb) \n\nc) \n\nFigure  3:  Typical  simulation  results  for  the  model  of  Kohonen  and  the  two  ver(cid:173)\nsions  of the  Growing  Cell  Structures.  The network  size  is  100  in  every  case.  The \nprobability distributions are  described  in fig.  1. \na)  Kohonen  Feature  Map  (KFM).  For  distributions  Band  C  the  fixed  network \nstructure  leads  to  long  connections  and  neurons  in  regions  with  zero  probability \ndensity. \nb)  Growing  Cell  Structures,  distribution  modelling  variant  (GCS-1).  The  growth \nprocess  combined with occasional  removal of \"superfluous\"  neurons  has led  to sev(cid:173)\neral  sub  networks  for  distributions  Band  C.  For  distribution  B  roughly  half  of \nthe  neurons  are  used  to model either  of the  squares.  This  corresponds  well  to the \nunderlying probability density. \nc)  Growing  Cell  Structures,  error  minimizing  variant  (GCS-2).  The  difference  to \nthe  previous variant can  be seen  best for  distribution  B,  where  only  a few  neurons \nare  used  to cover  the small square. \n\n\f130 \n\nFritzke \n\nnetwork is not a good solution to this problem, since the computational effort grows \nfaster  than quadratically with the network size. \n\nCurrently  applications  of variants  of the  new  method  to  image  compression  and \nrobot control are being investigated.  Furthermore a new type of radial basis function \nnetwork  related  to  (Moody  &  Darken  [1989])  is  being  explored,  which  is  based  on \nthe Growing  Cell  Structures. \n\nREFERENCES \n\nBauer,  H.- U.  &  K.  Pawelzik [1992},  \"Quantifying the neighborhood  preservation of \nself-organizing  feature  maps,\"  IEEE  Transactions  on  Neural  Networks 3, \n570-579. \n\nFavata, F.  &  R.  Walker [1991]'  \"A study of the application of Kohonen-type neural \nnetworks  to  the  travelling  Salesman  Problem,\"  Biological  Cybernetics 64, \n463-468. \n\nFritzke,  B. [1991],  \"Unsupervised  clustering  with  growing  cell  structures,\"  Proc.  of \n\nIJCNN-91,  Seattle, 531-536  (Vol.  II). \n\nFritzke,  B. [1992],  \"Growing cell  structures  - a self-organizing network  in k  dimen(cid:173)\nsions,\"  in  Artificial  Neural  Networks  II,  I.  Aleksander  &  J.  Taylor,  eds., \nNorth-Holland,  Amsterdam, 1051-1056. \n\nFritzke,  B.  & P.  Wilke [19911,  \"FLEXMAP - A neural network with linear time and \nspace  complexity for  the traveling salesman problem,\"  Proc.  of IJCNN-91, \nSingapore,  929-934. \n\nKohonen,  T. [19821,  \"Self-organized  formation  of  topologically  correct  feature \n\nmaps,\"  Biological Cybernetics 43,  59-69. \n\nMoody,  J.  & C.  Darken [19891,  \"Fast  Learning  in  Networks  of  Locally-Tuned  Pro(cid:173)\n\ncessing  Units,\"  Neural  Computation 1,  281-294. \n\nObermayer,  K.,  H.  Ritter  &  K.  Schulten [1990J,  \"Large-scale  simulations  of self(cid:173)\n\norganizing neural networks on parallel computers:  application to biological \nmodeling,\"  Parallel  Computing 14,381-404. \n\nRitter, H.J. & T. Kohonen [1989],  \"Self-Organizing Semantic Maps,\"  Biological Cy(cid:173)\n\nbernetics 61,241-254. \n\nSchweizer,  L.,  G.  Parladori,  G.L.  Sicuranza & S.  Marsi [1991},  \"A fully  neural  ap(cid:173)\nproach  to image compression,\"  in Artificial Neural  Networks,  T.  Kohonen, \nK.  Miikisara,  O.  Simula  &  J.  Kangas,  eds.,  North-Holland,  Amsterdam, \n815-820. \n\n\f", "award": [], "sourceid": 694, "authors": [{"given_name": "Bernd", "family_name": "Fritzke", "institution": null}]}