{"title": "Limitations of Self-organizing Maps for Vector Quantization and Multidimensional Scaling", "book": "Advances in Neural Information Processing Systems", "page_first": 445, "page_last": 451, "abstract": null, "full_text": "Limitations of self-organizing maps  for \n\nvector quantization and  multidimensional \n\nscaling \n\nArthur Flexer \n\nThe  Austrian  Research  Institute for  Artificial  Intelligence \n\nSchottengasse  3, A-lOlO  Vienna,  Austria \n\nDepartment of Psychology, University  of Vienna \n\nLiebiggasse 5, A-lOlO  Vienna, Austria \n\nand \n\narthur~ai.univie.ac.at \n\nAbstract \n\nThe  limitations  of  using  self-organizing  maps  (SaM)  for  either \nclustering/vector  quantization  (VQ)  or  multidimensional  scaling \n(MDS)  are  being  discussed  by  reviewing  recent  empirical findings \nand the relevant theory.  SaM 's remaining ability of doing both VQ \nand MDS  at the same time is challenged  by  a  new  combined  tech(cid:173)\nnique  of online  K-means  clustering  plus  Sammon  mapping of the \ncluster  centroids.  SaM are shown to perform significantly worse in \nterms of quantization error , in  recovering the structure of the clus(cid:173)\nters  and  in  preserving  the  topology  in  a  comprehensive  empirical \nstudy using a  series  of multivariate normal clustering  problems. \n\n1 \n\nIntroduction \n\nSelf-organizing  maps  (SaM)  introduced  by  [Kohonen  84]  are  a  very  popular  tool \nused  for  visualization  of  high  dimensional  data  spaces.  SaM  can  be  said  to  do \nclustering/vector  quantization  (VQ)  and at the  same  time to  preserve  the  spatial \nordering of the input data reflected  by an ordering of the code  book vectors (cluster \ncentroids)  in  a  one  or  two  dimensional output  space,  where  the  latter  property  is \nclosely  related  to multidimensional scaling  (MDS)  in  statistics.  Although the  level \nof activity and research  around the SaM algorithm is quite large (a recent  overview \nby  [Kohonen  95]  contains more than  1000  citations) , only  little comparison among \nthe  numerous  existing  variants of the  basic  approach  and  also  to  more traditional \nstatistical  techniques  of the  larger  frameworks  of VQ  and  MDS  is  available.  Ad(cid:173)\nditionally,  there  is  only  little  advice  in  the  literature  about  how  to  properly  use \n\n\f446 \n\nA.  Flexer \n\nSOM  in  order  to get  optimal results  in  terms of either  vector  quantization (VQ)  or \nmultidimensional scaling or maybe even  both of them.  To make the notion of SOM \nbeing  a  tool for  \"data visualization\"  more precise,  the following question  has to be \nanswered:  Should SOM  be used for  doing VQ,  MDS,  both at the same time or none \nof them? \n\nTwo recent comprehensive studies comparing SOM either to traditional VQ  or MDS \ntechniques  separately seem to indicate that SOM  is  not competitive when  used  for \neither  VQ  or  MDS:  [Balakrishnan et  al.  94J  compare SOM  to  K-means  clustering \non 108 multivariate normal clustering problems with known clustering solutions and \nshow that SOM  performs significantly worse in terms of data points misclassified 1 , \nespecially  with  higher  numbers  of clusters  in  the  data sets.  [Bezdek  &  Nikhil  95J \ncompare  SOM  to  principal  component  analysis  and  the  MDS-technique  Sammon \nmapping on seven  artificial  data sets  with  different  numbers of points  and  dimen(cid:173)\nsionality and  different  shapes  of input distributions.  The degree  of preservation  of \nthe  spatial  ordering  of the  input  data  is  measured  via  a  Spearman  rank  correla(cid:173)\ntion  between  the  distances  of points in  the  input  space  and  the  distances  of their \nprojections  in  the two  dimensional output space.  The traditional  MDS-techniques \npreserve  the  distances  much  more effectively  than  SOM,  the performance of which \ndecreases  rapidly with increasing  dimensionality of the input data. \n\nDespite these strong empirical findings that speak against the use of SOM for either \nVQ or  MDS there remains the appealing ability ofSOM to do both VQ  and MDS  at \nthe same time.  It is the aim of this work to find  out, whether a combined technique \nof traditional vector  quantization  (clustering)  plus  MDS  on  the  code  book  vectors \n(cluster  centroids)  can  perform  better  than  Kohonen's  SOM  on  a  series  of multi(cid:173)\nvariate  normal  clustering  problems  in  terms  of quantization  error  (mean  squared \nerror) ,  recovering  the  cluster  structure  (Rand  index)  and  preserving  the  topology \n(Pearson correlation).  All the experiments were  done in a rigoruos statistical design \nusing  multiple analysis of variance for  evaluation of the results. \n\n2  SOM  and  vector quantization/clustering \n\nA  vector  quantizer  (VQ)  is  a  mapping,  q,  that  assigns  to  each  input  vector  x  a \nreproduction (code book) vector x =  q( x)  drawn from a finite reproduction alphabet \nA = {Xi, i  = 1, ... , N}.  The quantizer q is completely described  by the reproduction \nalphabet (or codebook) A together with the partition S  = {Si , i  = 1, .. . , N}, of the \ninput vector space into the sets  Si  = {x : q(x) = xd  of input vectors  mapping into \nthe ith  reproduction  vector  (or code word)  [Linde et  al.  80J.  To be compareable to \nSO M,  our VQ  assigns  to each  of the input vectors  x  =  (xO, xl, . .. , x k- l )  a  socalled \ncode book vector x =  (xO, xl, ... , xk -1) of the same dimensionality k.  For reasons of \ndata compression, the number of code  book vectors  N  ~ n, where  n  is the number \nof input vectors. \n\nDemanded  is  a  VQ  that  produces  a  mapping q for  which  the  expected  distortion \ncaused  by  reproducing  the  input  vectors  x  by  code  book  vectors  q( x)  is  at  least \nlocally  minimal.  The  expected  distortion  is  usually  esimated  by  using  the  aver(cid:173)\nage  distortion  D,  where  the  most common distortion  measure is  the squared-error \n\n1 Although  SOM  is  an  unsupervised  technique  not  built  for  classification,  the  number \nof  points  missclassified  to  a  wrong  cluster  center  is  an  appropriate  and  commonly  used \nperformance  measure  for  cluster  procedures  if the  true cluster  structure is  known. \n\n\fLimitations of Self-organizing Maps \n\n447 \n\ndistortion  d: \n\nk-l \n\nd(x, x) = L  1 Xi  - Xi  12 \n\n(2) \n\ni=O \n\nThe classical vector  quantization technique to achieve such  a  mapping is the LBG(cid:173)\nalgorithm  [Linde  et  al.  80],  where  a  given  quantizer  is  iteratively  improved.  Al(cid:173)\nready  [Linde et al.  80]  noted  that  their  proposed  algorithm  is  almost  similar  to \nthe  k-means  approach  developed  in  the  cluster  analysis  literature  starting  from \n[MacQueen  67].  Closely related to SOM  is online K-means clustering (oKMC)  con(cid:173)\nsisting of the following steps: \n\n1.  Initialization:  Given N  = number of code book vectors,  k = dimensionality \nof the  vectors,  n  =  number  of input  vectors,  a  training  sequence  {Xj; j  = \n0, ... , n -I}, an initial set Ao  of N  code book vectors x and a discrete-time \ncoordinate t  =  0 ... , n  - 1. \n\n2.  Given  At  =  {Xi ; i  =  1, .. . , N},  find  the  minimum  distortion  partition \nIf \n\npeAt)  =  {Si; i  =  1, ... , N}.  Compute  d(Xt, Xi)  for  i  =  1, .. . , N. \nd(Xt, Xi)  ~ (Xt, XI)  for  alII, then  Xt  E Si. \n\n3.  Update the code  book vector with the minimum distortion \n\nX(t)(Si)  = x(t-1)(S;) + O'[X(t) - X(t-l)(Si)] \n\n(3) \n\nwhere  0'  is  a  learning parameter to  be  defined  by  the  user.  Define  At+1  = \nx(P(A t\u00bb,  replace t  by t + 1,  ift =  n  -1, halt.  Else  go  to step  2. \n\nThe  main  difference  between  the  SOM-algorithm and  oKMC  is  the  fact  that  the \ncode  book  vectors  are  ordered  either  on  a  line or  on  a  planar grid  (i.e.  in  a  one or \ntwo dimensional output space).  The iterative procedure is  the same as  with oKMC \nwhere formula (3)  is  replaced  by \n\nX(t)(S;) =  X(t-1)(Si) + h[x(t) - X(t-l)(Si)] \n\n(4) \n\nand this update is not only computed for  the Xi  that gives minimum distortion,  but \nalso for  all  the code  book vectors  which  are  in  the neighbourhood  of this Xi  on  the \nline  or planar grid.  The degree  of neighbourhood  and amount of code book vectors \nwhich  are updated together with the Xi  that gives  minimum distortion is  expressed \nby  h,  a  function  that  decreases  both  with  distance  on  the  line or  planar  grid  and \nwith  time and  that also includes  an  additional learning parameter  0' .  If the  degree \nof neighbourhood  is  decreased  to  zero,  the  SOM-algorithm  becomes  equal  to  the \noKMC-algorithm. \n\nWhereas  local  convergence  is  guaranteed  for  oKMC  (at  least  for  decreasing  0', \n[Bot.t.ou  &  Bengio  95]),  no  general  proof for  the convergence  of SOM  with nonzero \nneighbourhood  is  known.  [Kohonen  95,  p.128]  notes that the last. steps of the SOM \nalgorithm should be computed with zero  neighbourhood in order to guarantee  \"the \nmost.  accurate  density  approximation of the input samples\" . \n\n3  SOM and  multidimensional scaling \n\nFormally,  a  topology preserving  algorithm is  a  t.ransformation  <1l  :  Rk  .......  RP,  that \neither  preserves  similarities or just.  similarity  orderings of the  points  in  the  input \nspace  Rk  when they are mapped into the outputspace R?  For most algorithms it is \nthe case t.hat both the number of input vectors  1 x  E Rk  1 and the number of output \n\n\fA. Flexer \n\n448 \nvectors  I x E  RP  I are  equal  to n.  A  transformation  !l>  : x = !l>( x),  that  preserves \nsimilarities poses the strongest  possible constraint since d( Xi,  Xj)  =  cf( Xi,  X j) for  all \nXi, X JERk, all Xi,  X j  E RP,  i, j  =  1, .. . , n - 1 and d (cf)  being a measure of distance \nin  Rk  (RP).  Such  a  transformation is said to produce  an  isometric image. \nTechniques for  finding  such  transformations !l>  are,  among others,  various forms  of \nmultidimensional scalinl (MDS)  like metric  MDS  [Torgerson  52],  nonmetric MDS \n[Shepard  62] or Sammon mapping [Sammon 69],  but also principal component anal(cid:173)\nysis  (PCA)  (see  e.g. \nminimizing the following via steepest  descent: \n\n[Jolliffe 86])  or  SOM.  Sammon  mapping  is  doing  MDS  by \n\nSince the SOM  has  been  designed  heuristically  and  not to find  an  extremum for  a \ncertain  cost  or  energy  function 3  and  the  theoretical  connection  to the  other  MDS \nalgorithms remains unclear.  It should be noted that for  SOM  the number of output \nvectors  I x E  RP  I is  limited to N, the number of cluster  centroids x and that the x \nare further restricted  to lie on  a planar grid.  This restriction entails a  discretization \nof the outputspace RP . \n\n4  Online [(-means clustering plus Sammon mapping of the \n\ncl uster centroids \n\nOur  new  combined  approach  consists  of  simply  finding  the  set  of  A = {Xi, i  = \n1, ... , N}  code  book  vectors  that  give  the  minimum distortion  partition  P(A)  = \n{8i ; i  =  1, . .. , N} via the  oKMC  algorithm and  then  using  the  Xi  as  input vectors \nto Sammon mapping and thereby obtaining a two dimensional representation  of the \nXi  via minimizing formula (5).  Contrary to SOM,  this two dimensional representa(cid:173)\ntion is  not  restricted  to  any fixed  form  and  the  distances  between  the  N  mapped \nXi  directly  correspond  to  those  in  the  original  higher  dimension.  This  combined \nalgorithm is  abbreviated oKMC+ . \n\n5  Empirical comparison \n\nThe  empirical  comparison  was  done  using  a  3  factorial  experimental  design  with \n3  dependent  variables.  The  multivariate normal  distributions  were  generated  us(cid:173)\ning the procedure by  [Milligan &  Cooper  85],  which  since has been  used for several \ncomparisons of cluster  algorithms (see  e.g.  [Balakrishnan et al.  94]).  The  marginal \nnormal distributions gave internal cohesion of the clusters by warranting that more \nthan  99%  of the  data lie  within  3 standard  deviations  (IT).  External  isolation was \ndefined  as  having the first  dimension nonoverlapping by truncating the normal dis(cid:173)\ntributions  in  the  first  dimension  to  \u00b12IT  and  defining  the  cluster  centroids  to  be \n4.5IT  apart.  In  all  other  dimensions the clusters  were  allowed  to overlap  by  setting \nthe  distance  per  dimension  between  two  centroids  randomly  to  lie  between  \u00b16IT. \nThe data was  normalized to zero  mean and unit variance in  all  dimensions. \n\n2Note  that for  MDS  not  the  actual  coordinates  of the  points  in  the  input  space  but \n\nonly  their  distances  or the ordering  of the latter  are  needed. \n\n3[Erwin  et  al.  92]  even  showed  that such  an  objective function  cannot  exist  for  SOM. \n\n\fLimitations of Self-organizing Maps \n\n449 \n\nalgorithm \nSOM \n\noKMC+ \n\nno.  clusters \n\n4 \n\n9 \n\nmean SOM \n\n4 \n\n9 \n\nmean oKMC+ \n\ndimension  msqe  Rand \n1.00 \n0.91 \nO.YY \n0.97 \n0.97 \n0.96 \n0.97 \n0.99 \n0.99 \n1.00 \n0.98 \n0.99 \n0.98 \n0.99 \n\n0.53 \n1.53 \n1.15 \n0.33 \n0.54 \n0.81 \n0 .81 \n0.53 \n1.06 \n1.17 \n0.29 \n0.47 \n0.56 \n0 .68 \n\n4 \n6 \n8 \n4 \n6 \n8 \n\n4 \n6 \n8 \n4 \n6 \n8 \n\ncorr. \n0.64 \n0.72 \n0.74 \n0.48 \n0.66 \n0.74 \n0.67 \n0.87 \n0.87 \nO.Yl \n0.89 \n0.87 \n0.86 \n0.88 \n\nFactor  1,  Type  of algorithm:  The  number  of code  book  vectors  of both  the  SOM \nand the oKMC+ were set equal to the number of clusters  known to  be in the data. \nThe SOMs were  planar grids consisting of 2 x  2 (3  x  3)  code  book vectors.  During \nthe  first  phase  (1000  code  book  updates)  a  was  set  to  0.05  and  the  radius  of the \nneighbourhood to 2 (5).  During the second  phase (10000 code book updates)  a  was \nset to 0.02 and the radius ofthe neighbourhood to 0 to guarantee the most accurate \nvector quantization [Kohonen  95,  p.128].  The oKMC+ algorithm had the parameter \na  fixed  to  0.02  and  was trained  using  each  data set  20  times,  the  minimization of \nformula (5)  was  stopped after  100 iterations.  Both SOM  and oKMC+ were  run  10 \ntimes on each  data set  and only the best solutions,  in terms of mean squared error, \nwere  used  for  further  analysis. \n\nFactor 2,  Number  of clusters was  set  to 4 and 9. \n\nFactor  3,  Number  of dimensions was set  to 4,6, or8. \n\nDependent  variable  1:  mean  squared  error was  computed using formula (1). \nD ependent  variable  2,  Rand index (see  [Hubert  & Arabie 85])  is a  measure of agree(cid:173)\nment between  the true,  known  partition structure  and  the obtained clusters.  Both \nthe  numerator  and  the  denominator  of  the  index  reflect  frequency  counts.  The \nnumerator is  the number  of times a  pair of data is  either  in  the same  or  in  differ(cid:173)\nent  clusters  in  both  known  and  obtained  clusterings  for  all  possible  comparisons \nof data points.  Since  the  denominator is  the total  number of all  possible  pairwise \ncomparisons, an index value of 1.0 indicates an exact  match of the clusterings. \n\nDependent  variable 3,  correlation is a measure of the topology preserving abilities of \nthe algorithms. The Pearson correlation of the distances d( Xl, X2)  in the input space \nand the distances d( Xi,  X j) in the output space for  all possible pairwise comparisons \nof data points is  computed.  Note  that for  SOM  the  coordinates  of the  code  book \nvectors on the planar grid were used to compute the d.  An  algorithm that preserves \nall  dist.ances  in  every  neighbourhood  would  produce  an  isometric image and  yield \na  value  of  1.0  (see  [Bezdek  & Nikhil 95]  for  a  discussion  of measures  of  topolgy \npreservation) . \n\nFor each  cell  in the full-factorial 2 x 2 x 3 design  3 data sets with 25  points for  each \ncluster  were  generated  resulting  in  a  total  of 36  data sets.  A  multiple analysis  of \nvariance (MANOVA)  yielded  the following significant effects  at the  .05  error  level: \n\nThe  mean squared  error  is  lower  for  oKMC+  than  for  SOM,  it  is  lower  for  the  9-\ncluster problem than for  the 4-cluster problem and is  higher for  higher  dimensional \n\n\f450 \n\nA. Flexer \n\ndata.  There is  also  a  combined effect  of the number of clusters  and  dimensions on \nthe mean squared error.  The Rand index is higher for oKMC+ than for  SOM, there \nis  also a  combined effect  of the number of clusters  and  dimensions.  The correlation \nindex is higher for oKMC+ than for SOM. Since the main interest of this study is the \neffect  of the  type  of algorithm on  the  dependent  variables,  the  mean  performances \nfor  SOM  and oKMC+ are printed in  bold letters in the table.  Note that the overall \ndifferences  in the performances of the two  algorithms are blurred  by  the significant \neffects  of the  other factors  and  that  therefore  the  differences  of the  grand  means \nacross  the type of algorithms appear  rather small.  Only  by  applying a  MANOVA, \neffects  of the  factor  'type  of algorithms'  that  are  masked  by  additional  effects  of \nthe other two factors  'number of clusters'  and 'number of dimensions' could still be \ndetected. \n\n6  Discussion  and  Conclusion \n\nFrom the theoretical comparison of SOM  to oKMC it should be  clear that in  terms \nof quantization  error,  SOM  should  only  be  possible  to  perform  as  good  as  oKMC \nif SOM's  neighbourhood  is  set  to  zero.  Additional experiments,  not  reported  here \nin  detail  for  brevity,  with  nonzero  neighbourhood  till  the  end  of  SOM  training \ngave  even  worse  results  since  the  neighbourhood  tends  to  pull  the  obtained  clus(cid:173)\nter  centroids  away  from  the  true  ones.  The  Rand  index is  only  slightly  better  for \noKMC+.  The  high  values  indicate that  both  algorithms were  able  to  recover  the \nknown  cluster  structure.  Topology preserving  is  where  SOM  performs worst  com(cid:173)\npared  to  oKMC+.  This  is  a  direct  implication  of  the  restriction  to  planar  grids \nwhich  allows  only  2::=2 i,(&~2)  different  distances  in  an  s  x  s  planar  grid  instead \nof  N(~ -1)  different  distances  for  N  =  s  x  s  cluster  centroids  mapped via Sammon \nmapping in the case of oKMC+.  Using a nonzero neighbourhood at the end of SOM \ntraining did  not warrant  any significant improvements. \n\nAn  argument that could be brought forward  against our approach towards compar(cid:173)\ning  SOM  and  oKMC+  is  that it would  be unfair  or  not  correct  to set  the  number \nof SOM's  code  book  vectors  equal  to  the  number  of clusters  known  to  be  in  the \ndata.  In fact  it seems to be  common practice to apply SOM  with numbers of code \nbook vectors  that are a  multiple of the input vectors  available for  training (see  e.g. \n[Kohonen  95,  pp.113]).  Two things have to be said  against such  an argumentation: \nFirst if one uses more or even only the same amount of code book vectors than input \nvectors  during vector  quantization, each  code  book vector  will  become  identical to \none  of the  input  vectors  in  the  limit of  learning.  So  every  Xi  is  replaced  with  an \nidentical Xi,  which does not make any sense and runs counter to every  notion of vec(cid:173)\ntor  quantization.  This means that SOMs  employing numbers of code  book  vectors \nt.hat  are  a  multiple of the  input vectors  available can  be used  for  MDS  only.  But \neven  such  big SOMs  do  MDS  in  a  very crude way:  We  computed SOMs  consisting \nof either  20  x  20  (for  data sets  consisting  of 4  clusters  and  100  points)  or  30  x  30 \n(for  data sets  consisting  of 9 clusters  and  225  points)  code  book vectors  for  all  36 \ndata sets which gave an average correlation of 0.77 between the distances di  and di . \nThis is significantly worse at the  .05 error level  compared to the average correlation \nof 0.95  achieved  by  Sammon mapping applied to the input data directly. \n\nOur data sets consisted of iid multivariate normal distributions which therefore have \nspherical shape.  All VQ algorithms using squared distances as a distortion measure, \nincluding  our  versions  of oKMC  as well  as  SOM,  are  inherently  designed  for  such \ndistributions.  Therefore,  the clustering  problems in  this study,  being  also  perfectly \nseperable in  one  dimension, were  very simple and should  be  solveable with little or \nno  error  by  any  clustering or  MDS  algorithm. \n\n\fLimitations of Self-organizing Maps \n\n451 \n\nIn this work we  examined the vague concept of using SOM  as  a  \"data visualization \ntool\"  both from a theoretical  and empirical point of view.  SOM  cannot outperform \ntraditional VQ  techniques  in  terms of quantization error  and  should  therefore  not \nbe  used  for  doing  VQ.  From  [Bezdek  &  Nikhil  95]  as well  as  from  our discussion  of \nSOM's restriction  to planar grids in the output space which allows only a  restricted \nnumber  of  different  distances  to  be  represented,  it  should  be  evident  that  SOM \nis  also  a  rather  crude  way  of  doing  MDS.  Our  own  empirical  results  show  that  if \none  wants  to  have  an  algorithm  that  does  both  VQ  and  MDS  at  the same  time, \nthere exists a very  simple combination oftraditional techniques  (our oKMC+) with \nwellknown  and  established  properties that clearly  outperforms SOM. \n\nWhether  it  is  a  good  idea  to  combine  clustering  or  vector  quantization  and  mul(cid:173)\ntidimensional  scaling  at  all  and  whether  more  principled  approaches  (see  e.g. \n[Bishop  et  al.  this volume],  also for  pointers to further  related work)  can yield even \nbetter  results  than  our  oKMC+  and  last  but  not  least  what self-organizing  maps \nshmtld be used  for  under  this new  light remain  questions  to be  answered  by  future \ninvestigations. \n\nAcknowledgements:  Thanks are  due  to  James  Pardey,  University  of Oxford, for  the \nSammon  code.  The SOM_PAK,  Helsinki  University  of Technology,  was  used  for  all  com(cid:173)\nputations of self-organizing  maps.  This work has been started within the framework of the \nBIOMED-1 concerted  action  ANNDEE, sponsored  by the European  Commission,  DG XII, \nand  the Austrian  Federal  Ministry  of Science,  Transport,  and  the Arts,  which  is  also  sup(cid:173)\nporting the Austrian Research  Institute for  Artificial  Intelligence.  The author is supported \nby  a  doctoral  grant of the Austrian  Academy  of Sciences. \n\nReferences \n[Balakrishnan  et  al.  94]  Balakrishnan  P.V.,  Cooper M.C.,  Jacob V.S., Lewis P.A. : A study \nof  the  classification  capabilities  of  neural  networks  using  unsupervised  learning:  a \ncomparison  with  k-means  clustering,  Psychometrika,  Vol.  59,  No.4,  509-525,  1994. \n\n[Bezdek &  Nikhil  95]  Bezdek  J.C. ,  Nikhil  R.P.:  An  index  of topological  preservation  for \n\nfeature  extraction,  Pattern  Recognition,  Vol.  28,  No.3, pp.381-391,  1995. \n\n[Bishop  et  al.  this  volume]  Bishop  C.M.,  Svensen  M.,  Williams  C.K.I.:  GTM:  A  Princi(cid:173)\n\npled  Alternative  to the Self-Organizing  Map,  this volume. \n\n[Bottou  &  Bengio  95]  Bottou  1.,  Bengio  Y.:  Convergence  Properties of the  K-Means  Al(cid:173)\n\ngorithms,  in  Tesauro  G.,  et  al.(eds.),  Advances  in  Neural  Information  Processing \nSystem  7,  MIT  Press,  Cambridge,  MA,  pp.585-592,  1995. \n\n[Erwin  et  al.  92]  Erwin  E.,  Obermayer  K.,  Schulten  K.:  Self-organizing  maps:  ordering, \nconvergence  properties and  energy functions,  Biological  Cybernetics,  67,47- 55,  1992. \n[Hubert  & Arabie  85]  Hubert  L.J.,  Arabie  P.:  Comparing  partitions,  J . of Classification, \n\n2,  63-76,  1985. \n\n[Jolliffe  86]  Jolliffe  I.T.:  Principal  Component  Analysis,  Springer,  1986. \n[Kohonen  84]  Kohonen  T.:  Self-Organization  and  Associative  Memory,  Springer,  1984. \n[Kohonen  95]  Kohonen  T.:  Self-organizing  maps,  Springer,  Berlin,  1995. \n[Linde  et  al.  80]  Linde  Y. ,  Buzo  A.,  Gray  R.M.:  An  Algorithm  for  Vector  Quantizer  De(cid:173)\nsign,  IEEE Transactions on  Communications,  Vol.  COM-28,  No.1,  January,  1980. \n[MacQueen  67]  MacQueen  J.:  Some Methods for  Classification  and Analysis  of Multivari(cid:173)\n\nate  Observations,  Proc.  of the Fifth  Berkeley  Symposium  on  Math.,  Stat.  and  Prob., \nVol.  1,  pp.  281-296,  1967. \n\n[Milligan  &  Cooper 85]  Milligan  G.W.,  Cooper  M.C.:  An  examination  of  procedures  for \ndetermining  the number of clusters in  a data set,  Psychometrika 50(2),  159-179,  1985. \n[Sammon  69]  Sammon  J .W.:  A  Nonlinear  Mapping  for  Data  Structure  Analysis,  IEEE \n\nTransactions  on  Comp.,  Vol.  C-18,  No.5,  p.401-409,  1969. \n\n[Shepard  62]  Shepard  R.N.:  The  analysis  of proximities:  multidimensional  scaling  with \n\nan  unknown  distance  function .  I.,  Psychometrika,  Vol.  27,  No. 2,  p.125-140,  1962. \n\n[Torgerson  52]  Torgerson  W .S.:  Multidimensional  Scaling,  I:  theory  and  method,  Psy(cid:173)\n\nchometrika,  17, 401-419,  1952. \n\n\f", "award": [], "sourceid": 1295, "authors": [{"given_name": "Arthur", "family_name": "Flexer", "institution": null}]}