{"title": "Laterally Interconnected Self-Organizing Maps in Hand-Written Digit Recognition", "book": "Advances in Neural Information Processing Systems", "page_first": 736, "page_last": 742, "abstract": null, "full_text": "Laterally Interconnected  Self-Organizing \nMaps  in Hand-Written Digit  Recognition \n\nYoonsuck Choe,  Joseph Sirosh, and Risto Miikkulainen \n\nDepartment of Computer Sciences \nThe University of Texas  at Austin \n\nAustin,  TX 78712 \n\nyschoe,sirosh,risto@cs. u texas .ed u \n\nAbstract \n\nAn  application  of  laterally  interconnected  self-organizing  maps \n(LISSOM)  to handwritten  digit recognition  is  presented.  The lat(cid:173)\neral connections  learn the correlations of activity between units on \nthe  map.  The  resulting  excitatory  connections  focus  the  activity \ninto local patches and the inhibitory connections decorrelate redun(cid:173)\ndant activity on the map.  The map thus forms internal representa(cid:173)\ntions that are easy to recognize with e.g.  a perceptron network.  The \nrecognition rate on a subset of NIST database 3 is 4.0% higher with \nLISSOM  than  with  a  regular  Self-Organizing  Map  (SOM)  as  the \nfront  end,  and 15.8% higher than recognition of raw input bitmaps \ndirectly.  These results form a promising starting point for  building \npattern  recognition systems with a  LISSOM  map as  a  front  end. \n\nIntroduction \n\n1 \nHand-written digit recognition has become one of the touchstone problems in neural \nnetworks recently.  Large databases of training examples such as the NIST (National \nInstitute of Standards and Technology)  Special  Database 3 have  become available, \nand real-world applications with clear practical value, such as recognizing zip codes \nin  letters,  have  emerged.  Diverse  architectures  with  varying  learning  rules  have \nbeen  proposed,  including feed-forward  networks  (Denker  et  al. 1989;  Ie  Cun  et  al. \n1990;  Martin  and  Pittman  1990),  self-organizing maps  (Allinson  et  al. 1994),  and \ndedicated  approaches such  as  the neocognitron  (Fukushima and Wake  1990) . \n\nThe  problem  is  difficult  because  handwriting  varies  a  lot,  some  digits  are  easily \nconfusable,  and recognition must be based on small but crucial differences.  For ex(cid:173)\nample, the digits 3 and 8,  4  and 9,  and 1 and 7 have several  overlapping segments, \nand  the  differences  are  often  lost  in  the  noise.  Thus,  hand-written  digit  recogni(cid:173)\ntion can  be  seen  as  a  process  of identifying the  distinct features  and producing  an \ninternal  representation  where  the  significant differences  are  magnified,  making the \nrecognition easier. \n\n\fLaterally  Interconnected  Self-organizing  Maps  in  Handwritten  Digit  Recognition \n\n737 \n\nIn this  paper,  the  Laterally Interconnected  Synergetically Self-Organizing  Map ar(cid:173)\nchitecture  (LISSOM;  Sirosh  and  Miikkulainen  1994,  1995,  1996)  was  employed  to \nform such a separable representation.  The lateral inhibitory connections of the LIS(cid:173)\nSOM map decorrelate features in the input, retaining only those differences  that are \nthe  most significant .  Using  LISSOM  as  a  front  end,  the  actual  recognition  can be \nperformed by  any  standard  neural network  architecture,  such  as  the perceptron. \n\nThe  experiments  showed  that while  direct  recognition  of the  digit  bitmaps with  a \nsimple  percept ron  network  is  successful  72.3%  of the  time ,  and  recognizing  them \nusing  a  standard  self-organizing  map  (SOM)  as  the  front  end  84.1%  of the  time, \nthe recognition rate is  88.1 % based on the LISSOM  network .  These  results suggest \nthat LISSOM can serve as an effective front end for real-world handwritten character \nrecognition systems. \n\n2  The  Recognition  System \n2.1  Overall architecture \nThe system consists of two  networks:  a  20  x  20  LISSOM  map performs  the feature \nanalysis and decorrelation of the input, and a single layer of 10 perceptrons the final \nrecognition (Figure 1 (a)) . The input digit is represented  as  a bitmap on the 32 x 32 \ninput layer.  Each  LISSOM  unit is fully connected to the input layer through the af(cid:173)\nferent connections,  and to the other units in the map through lateral excitatory and \ninhibitory connections  (Figure  1  (b)).  The excitatory  connections  are short  range, \nconnecting only to the closest  neighbors of the unit,  but the inhibitory connections \ncover  the  whole  map.  The  percept ron  layer consists  of 10  units,  corresponding  to \ndigits  0  to  9.  The  perceptrons  are  fully  connected  to  the  LISSOM  map,  receiv(cid:173)\ning the  full  activation pattern on the  map as  their input .  The  perceptron  weights \nare  learned  through  the  delta rule,  and  the  LISSOM  afferent  and  lateral  weights \nthrough  Hebbian learning. \n\n2.2  LISSOM  Activity Generation and Weight  Adaptation \nThe  afferent  and lateral weights  in  LISSOM  are  learned  through  Hebbian  adapta(cid:173)\ntion.  A  bitmap image  is  presented  to  the  input  layer ,  and  the  initial  activity  of \nthe  map is  calculated  as  the  weighted  sum of the input .  For  unit  (i, j),  the initial \nresponse  TJij  IS \n\nTJij  =  (7 ('2: eabllij,ab)  , \n\na,b \n\n(1) \n\nwhere eab  is the activation of input unit (a, b),  Ilij ,ab  is the afferent weight connecting \ninput  unit  ( a, b)  to  map  unit  (i, j),  and  (7  is  a  piecewise  linear  approximation  of \nthe  sigmoid  activation  function .  The  activity  is  then  settled  through  the  lateral \nconnections.  Each  new  activity  TJij (t)  at  step t  depends  on  the  afferent  activation \nand  the lateral excitation and inhibition: \n\nTJiAt)  =  (7 ('2: eabllij,ab + Ie '2: Eij ,kITJkl(t  - 1)  - Ii '2: Iij,kITJkl(t  - 1)), \n\n(2) \n\na,b \n\nk,l \n\nk ,l \n\nwhere  Eij ,kl  and  Iij,kl  are  the  excitatory  and  inhibitory  connection  weights  from \nmap  unit  (k, l)  to  (i, j)  and  TJkl(t  - 1)  is  the  activation  of unit  (k , I)  during  the \nprevious  time  step.  The  constants  I e  and  Ii  control  the  relative  strength  of the \nlateral excitation  and inhibition. \n\nAfter the activity has settled, the afferent and lateral weights are modified according \nto the  Hebb  rule.  Afferent  weights  are  normalized so  that the length of the weight \n\n\f738 \n\nY.  CHOE, J.  SIROSH, R.  MIIKKULAINEN \n\nOutput Layer (10) \n\n.Lq?'Li7.L:17.La7'LV.87..,.Li7.LWLp' \n\n......  LISSOM Map LaY~/~~~X20) \n\nL::7.L7.L7\"\"'-.L::7LI7 \n\n~ ..... ..,. .... ..c:7\\ \n..c:7L:7.&l\u00a77'..,. ..... L7  \\. \n\n. \n;  .L7.L7.\u00a37.L7LSJ7L7 \n'L7.AlFL7.L7..c:7L7 \n\n.L7.A11P\".AIIP\"L7.L7.o \n\n: \n\n\" \n\n:mput L~yer (32x32) \n\n\". \n\nL7L7~~~~~~~L7L7 \n\nL7.L7L:7.L7.L7..c:7L7  L7L7~~~~~~~L7L7 . \n\n.L7.L7 .......... ..,..L7..c:7  L7L7L7L7L7L7L7L7L7L7L7. ' \n\n.L7..,..L7L::7.L7.L7..c:7  L7L7L7L7L7L7L7L7L7L7L70 ' \n\n..c:7..,..L7 ..... ~..c:7..c:7 \n\n0 \n\nL7..,...,..L7.L?..,.L7 \n\n..c:7..,..L7L7.L7..,..L7 \n\n. \n.... \n....  L7.L:7..,...,...,.L/.L:7 \n:L:7.L7..c:7.L7.L7..c:7L7 \n\n(a) \n\n20 \n\nUnit  OJ) \n\n\u2022 \ntII'd  Units with excitatory lateral connections to (iJ) \nUnits with inhibitory lateral connections to (iJ) \n\u2022 \n\n(b) \n\nFigure  1:  The system  architecture.  (a) The input  layer is  activated  according  to  the \nbitmap  image  of digit  6.  The activation  propagates  through  the  afferent  connections  to \nthe  LISSOM  map,  and settles  through its  lateral  connections  into  a  stable  pattern.  This \npattern is  the internal representation of the input that is then recognized by  the perceptron \nlayer.  Through ,the connections  from  LISSOM  to the perceptrons,  the unit representing  6 \nis  strongly activated,  with weak activations on other units such as 3 and 8.  (b)  The lateral \nconnections  to  unit  (i, j),  indicated  by  the  dark  square,  are  shown.  The  neighborhood \nof excitatory  connections  (lightly  shaded)  is  elevated  from  the  map  for  a  clearer  view. \nThe  units  in  the  excitatory  region  also  have  inhibitory  lateral  connections  (indicated  by \nmedium  shading)  to  the  center unit.  The excitatory  radius  is  1 and  the inhibitory  radius \n3  in  this  case. \n\nvector  remains the same; lateral weights are normalized to keep  the sum of weights \nconstant  (Sirosh  and  Miikkulainen  1994): \n\n.. \n\n(t + 1)  -\n\nIlij,mn(t) + crinp1]ij~mn \n\n- VLmn[llij,mn(t) + crinp1]ij~mnF' \n\nIllJ,mn \n\n.. \n\nW1J,kl \n\n(t + 1)  _  Wij,kl(t) + cr1]ij1]kl \n\n-\n\nwkl Wij ,kl  t  + cr1]ij1]kl \n\"'\"  [ \n]  , \n\n(  ) \n\n(3) \n\n(4) \n\nwhere  Ilij,mn  is  the  afferent  weight from  input  unit  (m, n)  to  map unit  (i, j),  and \ncrinp  is  the  input  learning rate;  Wij ,kl  is  the lateral weight  (either  excitatory  Eij ,kl \nor inhibitory  Iij ,kl)  from  map unit  (k, I)  to  (i, j),  and  cr  is  the lateral  learning rate \n(either  crexc  or  crinh). \n\n2.3  Percept ron Output Generation and Weight  Adaptation \nThe  perceptrons  at the output of the system receive  the  activation pattern on the \nLISSOM  map as  their  input.  The  perceptrons  are  trained  after  the  LISSOM  map \nhas  been organized.  The activation for  the perceptron  unit Om  is \n\nOm  = CL1]ij Vij,m, \n\ni,j \n\n(5) \n\nwhere  C  is  a  scaling  constant,  1]ij  is  the  LISSOM  map unit  (i,j),  and  Vij,m  is  the \nconnection  weight between  LISSOM  map unit  (i,j)  and output layer unit  m.  The \ndelta rule is  used to train the perceptrons:  the weight adaptation is  proportional to \nthe map activity and the difference  between the output and the  target: \n\nVij,m(t + 1) =  Vij,m(t) + crout1]ij((m  - Om), \n\n(6) \nwhere  crout  is  the  learning rate  of the  percept ron  weights,  1]ij  is  the  LISSOM  map \nunit  activity,  (m  is the target  activation for  unit  m.  ((m  =  1  if the correct  digit = \nm,  0  otherwise). \n\n\fLaterally  Interconnected  Self-organizing  Maps  in  Handwritten  Digit  Recognition \n\n739 \n\nI Representation  I  Training \n\nTest \n\nLISSOM \n\nSOM \n\nRaw  Input \n\n93.0/ 0.76  88.1/ 3.10 \n84.5/  0.68  84.1/  1.71 \n99.2/ 0.06  72.3/ 5.06 \n\nTable 1:  Final Recognition Results.  The average recognition percentage and its \nvariance  over  the  10  different  splits  are  shown  for  the  training and  test  sets.  The \ndifferences  in  each set  are statistically significant with p > .9999. \n\n3  Experiments \nA  subset  of 2992  patterns  from  the  NIST  Database  3  was  used  as  training  and \ntesting  data. 1  The patterns  were  normalized to  make sure  taht  each  example had \nan  equal  effect  on  the  LISSOM  map  (Sirosh  and  Miikkulainen  1994).  LISSOM \nwas  trained  with  2000  patterns.  Of these,  1700  were  used  to train  the perceptron \nlayer,  and  the  remaining  300  were  used  as  the  validation set  to  determine  when \nto  stop  training the  perceptrons.  The final  recognition  performance  of the  whole \nsystem  was  measured  on  the  remaining 992  patterns,  which  neither  LISSOM  nor \nthe  perceptrons  had  seen  during  training.  The experiment  was  repeated  10  times \nwith  different  random  splits  of  the  2992  input  patterns  into  training,  validation , \nand testing sets. \n\nThe LISSOM  map can be  organized starting from initially random weights.  How(cid:173)\never,  if the  input  dimensionality is  large,  as  it  is  in case  of the  32  X  32  bitmaps, \neach  unit on  the  map is  activated roughly  to the same degree,  and  it is  difficult to \nbootstrap  the  self-organizing  process  (Sirosh  and  Miikkulainen  1994,  1996).  The \nstandard  Self-Organizing  Map  algorithm  can  be  used  to  preorganize  the  map  in \nthis case.  The SOM  performs preliminary feature  analysis of the input,  and forms \na  coarse  topological  map of the  input  space.  This  map can  then  be  used  as  the \nstarting  point  for  the  LISSOM  algorithm,  which  modifies  the  topological  organi(cid:173)\nzation  and  learns  lateral  connections  that  decorrelate  and  represent  a  more  clear \ncategorization of the input patterns. \n\nThe initial self-organizing map was  formed  in  8 epochs  over  the training set,  grad(cid:173)\nually reducing the  neighborhood radius from 20  to 8.  The lateral connections  were \nthen  added  to  the  system,  and  over  another  30  epochs,  the  afferent  and  lateral \nweights of the map were  adapted according  to equations 3 and 4.  In the beginning, \nthe  excitation  radius  was  set  to  8  and  the  inhibition radius  to  20.  The excitation \nradius was gradually decreased  to 1 making the activity patterns more concentrated \nand  causing  the  units  to  become  more selective  to  particular  types  of input  pat(cid:173)\nterns.  For comparison, the initial self-organized map was also trained for  another 30 \nepochs,  gradually decreasing  the  neighborhood size  to  1 as  well.  The final  afferent \nweights for  the  SOM  and  LISSOM  maps are shown  in figures  2 and 3. \nAfter  the  SOM  and  LISSOM  maps  were  organized,  a  complete  set  of activation \npatterns on the  two  maps were  collected.  These patterns  then formed  the training \ninput  for  the  perceptron  layer.  Two  separate  versions  were  each  trained  for  500 \nepochs,  one  with  SOM  and  the  other  with  LISSOM  patterns.  A  third  perceptron \nlayer  was  trained directly with the input bitmaps as  well. \n\nRecognition  performance was  measured by counting how  often  the most highly  ac(cid:173)\ntive  perceptron  unit  was  the  correct  one.  The  results  were  averaged  over  the  10 \ndifferent  splits.  On  average,  the final  LISSOM+perceptron  system correctly  recog(cid:173)\nnized  88.1% of the 992  pattern test sets.  This is  significantly better than the 84.1% \n\n1 Downloadable  at ftp:j jsequoyah.ncsl.nist.gov jpubjdatabasesj. \n\n\f740 \n\nY. CHOE, J.  SIROSH, R.  MIIKKULAINEN \n\nIliiji'\u00b7~1,;i;:!il , \n'8 .......  \u00b7\u00b7\u00b7\u00b7 Slll ....  \".  \"1111 \n\n\"Q\" \n\n\" \" '\n\n.. '11 .111/1 \n\n<\u00b71,1111 \n\nFigure 2:  Final Afferent Weights of the SOM map.  The digit-like  patterns represent \nthe afferent  weights  of each map unit projected on  the input layer.  For example,  the lower \nleft  corner represents  the  afferent  weights  of unit  (0,0).  High  weight  values  are  shown in \nblack  and low in  white.  The pattern of weights shows  the input  pattern to which this unit \nis  most  sensitive  (6  in  this  case).  There  are local  clusters sensitive  to each  digit  category. \n\nof the  SOM+perceptron  system,  and  the  72.3%  achieved  by  the  perceptron  layer \nalone  (Table  1).  These  results  suggest  that the  internal  representations  generated \nby the  LISSOM  map are  more distinct  and easier  to recognize  than the  raw  input \npatterns  and the representations  generated  by  the SOM  map. \n\n4  Discussion \nThe  architecture  was  motivated by  the  hypothesis  that  the  lateral  inhibitory con(cid:173)\nnections of the LISSOM map would decorrelate  and force  the map activity patterns \nto  become  more  distinct.  The  recognition  could  then  be  performed  by  even  the \nsimplest  classification  architectures,  such  as  the  perceptron.  Indeed,  the  LISSOM \nrepresentations  were  easier  to  recognize  than  the  SOM  patterns,  which  lends  evi(cid:173)\ndential support to the hypothesis.  In additional experiments, the percept ron output \nlayer  was  replaced  by  a  two-weight-Iayer  backpropagation network  and  a  Hebbian \nassociator net,  and  trained with  the same patterns  as  the perceptrons.  The recog(cid:173)\nnition results  were  practically the  same for  the perceptron,  backpropagation,  and \nHebbian  output  networks,  indicating  that  the  internal  representations  formed  by \nthe LISSOM  map are  the crucially important part of the recognition system. \n\nA  comparison of the learning curves reveals two interesting effects  (figure  4).  First, \neven  though  the  perceptron  net  trained  with  the  raw  input  patterns  initially  per(cid:173)\nforms well  on the test set,  its generalization decreases  dramatically during training. \nThis is  because  the net only learns to memorize the training examples,  which  does \nnot  help  much  with  new  noisy  patterns.  Good  internal  representations  are  there(cid:173)\nfore  crucial  for  generalization.  Second,  even  though  initially the  settling  process \nof the  LISSOM  map forms  patterns  that  are  significantly  easier  to recognize  than \n\n\fLaterally  Interconnected  Self-organizing  Maps  in  Handwritten  Digit  Recognition \n\n741 \n\nFigure  3:  Final Afferent  Weights  of the  LISSOM  map.  The squares  identify  the \nabove-average inhibitory lateral connections  to unit  (10,4)  (indicated by  the thick square). \nNote that inhibition comes mostly from areas of similar functionality  (i.e.  areas sensitive  to \nsimilar input),  thereby decorrelating the map activity and forming a sparser representation \nof the input. \n\nthe initial,  unsettled  patterns  (formed through  the afferent  connections  only),  this \ndifference  becomes  insignificant later during training.  The afferent  connections  are \nmodified according  to the final,  settled  patterns,  and gradually learn  to anticipate \nthe decorrelated  internal representations  that the lateral connections form. \n\n5  Conclusion \nThe experiments reported in this paper show that LISSOM forms internal represen(cid:173)\ntations of the input patterns  that  are easier to categorize  than  the raw  inputs  and \nthe  patterns  on the SOM  map,  and suggest  that  LISSOM  can form  a  useful  front \nend  for  character  recognition  systems,  and  perhaps  for  other  pattern  recognition \nsystems  as  well  (such  as  speech) .  The  main  direction  of future  work  is  to  apply \nthe  approach to larger data sets,  including the full  NIST 3 database,  to use  a  more \npowerful  recognition  network  instead  of the  perceptron,  and  to  increase  the  map \nsize  to obtain a  richer  representation of the input space. \n\nAcknowledgements \nThis  research  was  supported  in  part  by  National  Science  Foundation under  grant \n#IRI-9309273.  Computer time for  the simulations was  provided by the Pittsburgh \nSupercomputing Center  under grants IRI930005P  and IRI940004P,  and by  a  High \nPerformance Computer Time Grant from the University of Texas  at Austin. \nReferences \nAllinson,  N.  M.,  Johnson , M.  J., and Moon, K. J.  (1994).  Digital realisation of self(cid:173)\n\norganising maps.  In Touretzky,  D.  S.,  editor,  Advances  in Neural  Information \nProcessing  Systems  6. San  Mateo,  CA:  Morgan Kaufmann. \n\n\f742 \n\nY.  CHOE. J.  SIROSH. R.  MIIKKULAINEN \n\n100 \n\n95 \n\n90 \n\n85 \n\n80 \n\n75 \n\n\"0 \n~ \n0 \n() \n-oe. \n\nComparison:Test \n\n'SettIEi<CLlSSOU'  -\n\n'Unsettled  LISSOM'  ----. \n'SOM'  .... . \n. -\n:.Rawj~~~t' ... . \n\n~---... ----- -------_.-----~------ -- -\n\n... j .\n\n. \n\n.\n\n.... -.---_ ..... --.- ......... . \n\n..  . .  . . ... ,.. \n\n.  \" ,  .. ~ \n\n. '\" ..... - ~. ...  .. .................... --... .. \n\n __ ~\n\n ____ L -__ -L ____ L -__ -L ____ L -__ -L ____ L -__ ~\n\n __ ~\n\no \n\n50 \n\n100 \n\n150 \n\n200 \n\n250 \n\nEpochs \n\n300 \n\n350 \n\n400 \n\n450 \n\n500 \n\nFigure  4:  Comparison of the  learning curves,  A perceptron network  was  trained  to \nrecognize  four  different  kinds  of internal  representations:  the  settled  LISSOM  patterns, \nthe  LISSOM  patterns  before  settling,  the  patterns  on  the  final  SOM  network,  and  raw \ninput bitmaps.  The recognition  accuracy  on  the test set  was  then measured and averaged \nover  10  simulations.  The generalization  of the  raw  input  + perceptron  system  decreases \nrapidly as  the net learns to memorize the training patterns.  The difference  of using settled \nand  unsettled  LISSOM  patterns  diminishes  as  the  afferent  weights  of  LISSOM  learn  to \ntake into  account  the  decorrelation  performed by  the lateral weights. \n\nDenker, J. S., Gardner, W. R., Graf, H.  P., Henderson,  D., Howard, R. E., Hubbard, \nW., Jackel, L.  D., Baird, H.  S., and Guyon, I. (1989).  Neural network recognizer \nfor hand-written zip code digits. In Touretzky, D. S., editor,  Advances in Neural \nInformation  Processing  Systems  1.  San  Mateo,  CA:  Morgan Kaufmann . \n\nFukushima,  K.,  and  Wake,  N.  (1990).  Alphanumeric  character  recognition  by \n\nneocognitron.  In  Advanced Neural  Computers,  263- 270.  Elsevier  Science  Pub(cid:173)\nlishers  B.V .  (North-Holland). \n\nIe  Cun,  Y.,  Boser,  B.,  Denker,  J.  S.,  Henderson,  D.,  Howard,  R.  E.,  Hubbard, \n\nW.,  and  Jackel,  1.  D.  (1990) .  Handwritten  digit  recognition  with  a  back(cid:173)\npropagation network.  In  Touretzky,  D.  S.,  editor,  Advances  in  Neural  Infor(cid:173)\nmation  Processing  Systems  2. San  Mateo,  CA:  Morgan  Kaufmann. \n\nMartin,  G.  L.,  and  Pittman,  J.  A.  (1990).  Recognizing  hand-printed  letters  and \ndigits.  In Touretzky,  D.  S., editor,  Advances  in  Neural  Information Processing \nSystems  2.  San Mateo,  CA:  Morgan  Kaufmann. \n\nSirosh,  J. ,  and  Miikkulainen,  R.  (1994).  Cooperative  self-organization  of afferent \n\nand lateral connections  in cortical maps.  Biological  Cybernetics,  71:66- 78. \n\nSirosh,  J.,  and  Miikkulainen,  R.  (1995).  Ocular  dominance  and  patterned  lateral \nconnections in a self-organizing model of the primary visual cortex.  In Tesauro, \nG .,  Touretzky, D.  S.,  and Leen,  T . K., editors,  Advances  in  Neural  Information \nProcessing  Systems  7.  Cambridge, MA:  MIT Press. \n\nSirosh,  J.,  and Miikkulainen, R.  (1996).  Topographic receptive fields  and patterned \n\nlateral interaction in a self-organizing model of the primary visual cortex.  Neu(cid:173)\nral  Computation  (in  press). \n\n7\n0\n~\n \n\f", "award": [], "sourceid": 1149, "authors": [{"given_name": "Yoonsuck", "family_name": "Choe", "institution": null}, {"given_name": "Joseph", "family_name": "Sirosh", "institution": null}, {"given_name": "Risto", "family_name": "Miikkulainen", "institution": null}]}