{"title": "The Performance of Convex Set Projection Based Neural Networks", "book": "Neural Information Processing Systems", "page_first": 534, "page_last": 543, "abstract": null, "full_text": "534 \n\nThe  Performance  of  Convex  Set  projection  Based  Neural  Networks \n\nRobert  J.  Marks  II,  Les  E.  Atlas,  Seho  Oh  and  James  A.  Ritcey \n\nInteractive  Systems  Design  Lab,  FT-IO \n\nUniversity  of  Washington,  Seattle,  Wa  98195. \n\nABSTRACT \n\nand \n\nsignal \n\nvisualized \n\ngeometrically \n\na \nnetworks \n\nWe  donsider  a  class  of  neural  networks  whose  performance  can  be \nanalyzed \nin \nspace \n(APNN' s) \nenvironment.  Alternating  projection  neural \nperform  by  alternately  projecting  between \ntwo  or  more  constraint \nsets.  Criteria \nfor  desired  and  unique  convergence  are  easily \nestablished.  The  network  can  be  configured  in  either  a  homogeneous \nor  layered  form.  The  number  of  patterns  that  can  be  stored  in  the \nnetwork  is  on  the  order  of  the  number  of  input  and  hidden  neurons. \nIf  the  output  neurons  can  take  on  only  one  of  two  states,  then  the \ntrained  layered  APNN  can  be  easily  configured  to  converge  in  one \niteration.  More  generally,  convergence  is  at  an  exponential  rate. \nConvergence \ntype \nnonlinearities,  network  relaxation  and/or  increasing  the  number  of \nneurons \nthe  network \nresponds  to  data  for  which  it  was  not  specifically  trained  (i.e. \nhow  it  generalizes)  can  be  directly  evaluated  analytically. \n\nlayer.  The  manner \n\nthe  hidden \n\nin  which \n\nimproved \n\nsigmoid \n\nthe \n\ncan \n\nbe \n\nin \n\nby \n\nuse \n\nof \n\n1.  INTRODUCTION \n\nIn \n\nfrom \n\nthis  paper,  we  depart \n\nthe  performance  analysis \ntechniques  normally  applied  to  neural  networks.  Instead,  a  signal \nspace  approach  is  used  to  gain  new  insights  via  ease  of  analysis \nand  geometrical \nlaid \nthat  alternating  projecting  neural \nelsewhere l - 3 ,  we  demonstrate \nnetwork's \nsuch \ncan  be \nconfigured  in  layered  form  or  homogeneously. \n\ninterpretation.  Building  on \n\nfoundation \n\nformulated \n\nviewpoint \n\n(APNN's) \n\nfrom \n\na \n\na \n\nSignificiantly,  APNN's \n\nhave \n\nadvantages  over  other  neural \n\nnetwork  architectures .  For  example, \n(a)  APNN's  perform  by  alternatingly  projecting  between  two  or  more \n\nconstraint  sets.  Criteria  can  be  established  for  proper \niterative  convergence  for  both  synchronous  and  asynchronous \noperation.  This  is  in  contrast  to  the  more  conventional \ntechnique  of  formulation  of  an  energy  metric  for  the  neural \nnetworks,  establishing  a  lower  energy  bound  and  showing  that \nthe  energy  reduces  each  iteration4- 7 \u2022  Such  procedures  generally \ndo  not  address  the  accuracy  of  the  final  solution.  In  order  to \nassure  that  such  networks  arrive  at  the  desired  globally \nminimum  energy,  computationaly  lengthly  procedures  such  as \nsimulated  annealing  are  used B - 10 \u2022  For  synchronous  networks, \nsteady  state  oscillation  can  occur  between  two  states  of  the \nsame  energyll \n\n(b)  Homogeneous  neural  networks  such  as  Hopfield's  content \n\naddressable  memory4,12-14  do  not  scale  well,  i.e.  the  capacity \n\n\u00a9 American Institute of Physics 1988 \n\n\f535 \n\nof  Hopfield's  neural  networks  less  than  doubles  when  the  number \nof  neurons  is  doubled  15-16.  Also,  the  capacity  of  previously \nproposed  layered  neural  networks 17 ,18  is  not  well  understood. \nThe  capacity  of  the  layered  APNN'S,  on  the  other  hand,  is \nroughly  equal  to  the  number  of  input  and  hidden  neurons 19 \u2022 \n\n(c)  The  speed  of  backward  error  propagation  learning  17-18  can  be \n\npainfully  slow.  Layered  APNN's,  on  the  other  hand,  can  be \ntrained  on  only  one  pass  through  the  training  data 2 \u2022  If  the \nnetwork  memory  does  not  saturate,  new  data  can  easily  be \nlearned  without  repeating  previous  data.  Neither  is  the \neffectiveness  of  recall  of  previous  data  diminished.  Unlike \nlayered  back  propagation  neural  networks,  the  APNN  recalls  by \niteration.  Under  certain  important  applications,  however,  the \nAPNN  will  recall  in  one  iteration. \n\n(d)  The  manner  in  which  layered  APNN's  generalizes  to  data  for \nwhich  it  was  not  trained  can  be  analyzed  straightforwardly. \n\nThe  outline  of  this  paper  is  as  follows.  After  establishing  the \ndynamics  of  the  APNN  in  the  next  section,  sufficient  criteria  for \nproper  convergence  are  given.  The  convergence  dynamics  of  the  APNN \nare  explored.  Wise  use  of  nonlinearities,  e.g.  the  sigmoidal  type \nnonlinearities 2 , \nimprove  the  network's  performance.  Establishing  a \nhidden  layer  of  neurons  whose  states  are  a  nonlinear  function  of \nthe \nthe  network's \ncapacity  and  the  network's  convergence  rate  as  well.  The  manner  in \nwhich  the  networks  respond  to  data  outside  of  the  training  set  is \nalso  addressed. \n\ninput  neurons'  states \n\nincrease \n\nshown \n\nto \n\nis \n\n2.  THE  ALTERNATING  PROJECTION  NEURAL  NETWORK \n\nIn  this  section,  we \nNonlinear  modificiations \nperformance  attributes  are  considered  later. \n\nestablished  the \nto \nthe  network \n\nnotation  for \nmade \n\nthe  APNN. \nimpose  certain \n\nto \n\nlevel \n\nI ... I\u00a3N \n\n[\u00a31  1\u00a32 \n\nlinearly \n\nConsider  a \n\nset  of  N  continuous \n\nindependent \nlibrary  vectors  (or  patterns)  of  length  L> N:  {\u00a3n  I  OSnSN}.  We  form \n]  and  the  neural  network \nthe  library  matrix  !:.  = \ninterconnect  matrixa  T  =  F \nthe  superscript  T \ndenotes  transposition.  We  divide  the  L  neurons  into  two  sets:  one \nin  which  the  states  are  known  and  the  remainder  in  which  the  states \nare \nto \napplication.  Let  Sk  (M)  be  the  state  of  the  kth  node  at  time  M.  If \nthe  kth  node  falls  into  the  known  catego~,  its  state  is  clamped  to \nthe  known  value  (i.e.  Sk  (M)  =  Ik  where  I  is  some  library  vector). \nThe  states  of  the  remaining  floating  neurons  are  equal  to  the  sum \nof  the  inputs  into  the  node.  That  is,  Sk  (M)  = \n\nunknown.  This  partition  may \n\nfrom  application \n\nFT  where \n\n(!:.T  !:.  )-1 \n\nchange \n\ni k ,  where \n\nL \n\ni k  =  r  tp k  sp \n\np  =  1 \n\n(1) \n\na  The  interconnect  matrix  is  better  trained  iteratively2.  To  include \n\na  new  library  vector  \u00a3,  the  interconnects  are  updated  as \n!  +  (EE \n\n~T~ \n~ \n(E  E)  where  E  =  (.!. - !) f. \n\n~T \n) \n\n~ \n\n/ \n\n\f536 \n\nIf  all  neurons  change  state  simultaneously  (i.e.  sp  = sp  (M-l) ),  then \nthe  net  is  said  to  operate  synchronously.  If  only  one  neuron  changes \nstate  at  a  time,  the  network  is  operating  asynchronously. \n\nLet  P  be  the  number  of  clamped  neurons.  We  have  provenl \n\nthat  the \nneural  states  converge  strongly  to  the  extrapolated  library  vector \nif  the  first  P  rows  of  ! \nform  a  matrix  of  full  column \nrank.  That \nlinear \nremainin.,v. 2  By  strong  convergenceb ,  we  mean \ncombination  of  those \nII  1 (M) \n\nis,  no  column  of  ~  can  be  expressed  as  a \n- t  II  ==  0  where  II  x  II \n\n(denoted  KP) \n\n==  iTi. \n\nlim \n\nM~OO \n\nLastly,  note  that  subsumed  in  the  criterion  that  ~  be  full \nrank  is  the  condition  that  the  number  of  library  vectors  not  exceed \nthe  number  of  known  neural  states  (P  ~ N).  Techniques  to  bypass  this \nrestriction  by  using  hidden  neurons  are  discussed  in  section  5. \n\nPartition  Notation: \nthat  neurons  1  through \nfloating.  We  adopt  the \n\nWithout  loss  of  generality,  we  will  assume \nP  are  clamped  and  the  remaining  neurons  are \nvectOr  partitioning  notation \n\nIIp] \n7 \n1  =  ~ \n\nio \n\nwhere  Ip \nis  the  P-tuple  of  the  first  P  elements  of  1.  and  10  is  a \nvector  of  the  remaining  Q =  L-P.  We  can  thus  write,  for  example,  ~ \nthe  neural  clamping  operator  by:  7  _  IL] \n[  f~  If~  I ... If:  ].  Using  this  partition  notation,  we  can  define \nThus,  the  first  P  elements  of  I  are  clamped  to  l P \u2022  The  remaining  Q \nnodes  \"float\". \n\n!l ~ -\n\n7 \n10 \n\nPartition  notation \n\nuseful.  Define \n\nfor  the  interconnect  matrix  will  also  prove \n\nT r!2  I !lJ \nL~ \n\nwhere  ~2  is  a  P  by  P  and  !4  a  Q  by  Q  matrix. \n\n3.  STEADY  STATE  CONVERGENCE  PROOFS \n\nFor  purposes  of  later  reference,  we  address  convergence  of  the \nsynchronous  operation.  Asynchronous  operation \nis \nreference  2.  For  proper  convergence,  both  cases \nthe \n\nnetwork \naddressed \nrequire \nnetwork  iteration  in  (1)  followed  by  clamping  can  be  written  as: \n\nrank.  For  synchronous  operation, \n\nthat  ~  be  full \n\nfor \nin \n\n(2) \nAs  is  illustrated  in l  - 3,  this  operation  can  easily  be  visualized \nin  an  L  dimensional  signal  space. \n\ns(M+l)  =!l  ~ sCM) \n\n~ \n\n~ \n\nb  The  referenced  convergence  proofs  prove  strong  convergence  in  an \n\ninfinite  dimensional  Hilbert  space.  In  a  discrete  finite \ndimensional  space,  both  strong  and  weak  convergence  imply \nuniform  convergence l9 \u2022 2D ,  i.e.  1(M)~t as  M~oo. \n\n\fFor  a  given  partition  with  P  clamped  neurons, \n\nwritten  in  partitioned  form  as \n\n[ ;'(M+J \n\n!l \n\nl*J[ I'  J \n\n!3 !4  \n\n~o (M) \n\n537 \n\n(2) \n\ncan  be \n\n(3) \n\nThe  states  of  the  P  clamped  neurons  are  not  affected  by  their  input \nsum.  Thus,  there  is  no  contribution  to  the  iteration  by  ~1  and  ~2. \nWe  can  equivalently  write  (3)  as \n\n-+0 \ns \n\n(M+ 1)  =  !3 f  +!4 s \n\n-;tp-+o \n\n(M) \n\n(4 ) \n\nshow \n\nin  that  if  fp \n\nWe \nthen  the  spectral  radius \n(magnitude  of  the  maximum  eigenvalue)  of  ~4  is  strictly  less  than \none19 \u2022  It  follows  that  the  steady  state  solution  of  (4)  is: \n\nis  full  rank, \n\nwhere,  since  fp  is  full  rank,  we  have  made  use  of  our  claim  that \n\n-+0 \nS \n\n(00)  =  f \n\n-;to \n\n4.  CONVERGENCE  DYNAMICS \n\n(5 ) \n\n(6) \n\nfp \n\nis  full  column \n\nIn  this  section,  we  explore  different  convergence  dynamics  of \nthe  APNN  when \nlibrary  matrix \ndisplays  certain  orthogonality  characteristics,  or  if  there  is  a \nsingle  output  (floating)  neuron,  convergence  can  be  achieved  in  a \nsingle  iteration.  More  generally,  convergence  is  at  an  exponential \nrate.  Two \nimprove  convergence.  The \nfirst  is  standard  relaxation.  Use  of  nonlinear  convex  constraint  at \neach  neuron  is  discussed  elsewhere 2 ,19. \n\ntechniques  are  presented  to \n\nrank. \n\nthe \n\nIf \n\nOne  Step  Convergence:  There  are  at  least  two  important  cases  where \nthe  APNN  converges  other  than  uniformly  in  one  iteration.  Both \nrequire  that  the  output  be  bipolar  (\u00b11). \nConvergence  is  in  one \nstep  in  the  sense  that \n\n(7) \nwhere  the  vector  operation  sign  takes  the  sign  of  each  element  of \nthe  vector  on  which  it  operates. \n\n(1) \n\n-;to \nf \n\n-+0 \n=  Slgn  s \n\n\u2022 \n\n(1 \n\nsO  (1) \n\nt LL ) ,0  . Since  the  eigenvalue  of  the  (scalar) \n\nCASE  1:  If  there  is  a  single  output  neuron,  then,  from  (4),  (5)  and \n(6), \nmatrix,  !4 =  tL L  lies  between  zero  and  one 1  9,  we  conclude  that  1-\nt LL > O.  Thus,  if  ,0  is  restricted  to  \u00b11, \n(7)  follows  immediately.  A \ntechnique  to  extend  this  result  to  an  arbitrary  number  of  output \nneurons  in  a  layered  network  is  discussed  in  section  7. \n\n-\n\nCASE  2:  For  certain  library  matrices,  the  APNN  can  also  display  one \nstep  convergence.  We  showed  that  if  the  columns  of  K are  orthogonal \nand  the  columns  of  fp  are  also  orthogonal, \nthen  one  synchronous \niteration  results  in  floating  states  proportional  to  the  steady \n\n\f538 \n\nstate  values 19 \u2022  Specifically,  for  the  floating  neurons, \n\n~o (1) \n\nt P \nII \n111112 \n\n2 \n\nII  10 \n\n(8) \n\nAn  important  special  case  of  (8)  is  when  the  elements  of  Fare \nall  \u00b11  and  orthogonal.  If  each  element  were  chosen  by  a  50-50  coin \nflip,  for  example,  we  would  expect  (in  the  statistical  sense)  that \nthis  would  be  the  case. \n\nExponential  Convergence:  More  generally, \nthe  convergence  rate  of \nthe  APNN  is  exponential  and  is  a  function  of  the  eigenstructure  of \n.!4.  Let  {~r  I  1 ~ r  ~ Q }  denote  the  eigenvectors  of  .!4  and  {Ar }  the \ncorresponding  eigenvalues.  Define  ~  = \nthe \ndiagonal  matrix  A4 \n... Ao] T  \u2022 Then  we  can \nWrl.te  :!.4.=~  _4  ~.  Defl.ne  x (M)  =~ s  (M).  S.;nce  ~ ~ =  I,  \\t...,.  fol  ows T  ro~ \nthe--+differe-ace  equatJ-on  i~  ('Up  that  x(M+l)=~:!.4 ~ ~ sCM)  +  ~ .!3 1 \n=~4 x  (M)  +  g  where  g  =  ~.!3 t.  The  solution  to  this  difference \nequation  is \n\nsuch  that  diag  ~ = [AI  A2 \n\u2022 \n\n~l 1~2 I ... I~o]  and \n\n- .  \n\nT-+ \n\nA \n\n1 \n\n[ \n\n-+ \n\n. \n\nT \n\n-\n\nf \n\nT \n\n\u2022 \n\nM \n\n't'  \"r \n/\\ok \n1J \n\nr  =  0 \n\ng k  = \n\n[  1  _  \"kM + 1  ] \n\n/\\0 \n\n(  1  -\n\n, , - 1 \n/\\ok) \n\ng k \n\n(9) \n\nSince  the  spectral  radius  of  !4  is  less  than  one19 ,  ~: ~ 0  as  M  ~ \n~.  Our  steady  state  result  is  thus  xk  (~)  = \ngk.  Equation \n(9)  can  therefore  be  wrl.tten  as  x k (M)  = \nx k  (~).  The \neCflivalent  of  a  \"time  constant\"  in  this  exponential  convergence  is \n1/ tn (111 Ak  I).  The  speed  of  convergence  is  thus  dictated  by \nthe \nspectral  radius  of  .!4.  As  we  have  shown19  later,  adding  neurons  in \na  hidden  layer  in  an  APNN  can  significiantly  reduce  this  spectral \nradius  and  thus  improve  the  convergence  rate. \n\n(1  - Ak ) \n[\"M+l] \n/\\ok \n\n. \n\n1 \n\n-\n\nRelaxation:  Both \nthe  projection  and  clamping  operations  can  be \nrelaxed  to  alter  the  network's  convergence  without  affecting  its \nsteady  state 20 - 21 \u2022  For  the  interconnects,  we  choose  an  appropriate \n(0,2)  and \nthe  relaxation  parameter  a \nvalue  of \nredefine \nthe \na)I  or \nequivalently, \n\n9 \ninterconnect  matrix  as  T \n\ninterval \n(1 \n\nin  the \n\naT  + \n\n=  {a(t nn -l)+1 \n\n;  n  =m \n\na tnrn \n\nTO  see  the  effect  of  such  relaxation  on  convergence,  we  need \nsimply  exam\\ne  the  resulting  ::dgenvalues.  If  .!4  has  eigenvalues \n{Ar  I,  then  .!4  has  eigenvalues  Ar  =  1  +  a  (Ar  - 1).  A  Wl.se  choice  of  a \nreduces  the  spectral  radius  of  .!~  with  respect  to  that  of  .!4'  and \nthus  decreases  the  time  constant  of  the  network's  convergence. \n\nAny  of  the  operators  projecting  onto  convex  sets  can  be  relaxed \nwithout  affecting  steady  state  convergence19 - 20 \u2022  These  include  the \n~  operator 2  and  the  sigmoid-type  neural  operator  that  projects  onto \na  box.  Choice  of  stationary  relaxation  parameters  without  numerical \nandlor  empirical  study  of  each  specific  case,  however,  generally \nremains  more  of  an  art  than  a  science. \n\n\f5.  LAYERED  APNN' S \n\n539 \n\nthe  same  set  of  neurons  always  provides \n\nThe  networks  thus  far  considered  are  homogeneous  in  the  sense \nthat  any  neuron  can  be  clamped  or  floating.  If  the  partition  is \nsuch  that \nthe  network \nthe  networks  can  be \nstimulus  and \nsimplified.  Clamped  neurons,  for  example, \nignore  the  states  of  the \nother  neurons.  The  corresponding  interconnects  can  then  be  deleted \nfrom \nthe  neurons  are  so \npartitioned,  we  will  refer  the  APNN  as  layered. \n\nthe  neural  network  architecture.  When \n\nremainder \n\nrespond, \n\nthen \n\nthe \n\nIn  this  section,  we  explore  various  aspects  of  the  layered  APNN \nand  in  particular,  the  use  of  a  so  called  hidden  layer  of  neurons \nto \nthe  network.  An  alternate \narchitecture  for  a  homogeneous  APNN  that  require  only  Q  neurons  has \nbeen  reported  by  Marks 2  \u2022 \n\nthe  storage  capacity  of \n\nincrease \n\nform, \n\n(XOR). \n\nIn  its  generic \n\nthe  APNN  cannot  perform  a \nHidden  Layers: \nsimple  exclusive  or \nIndeed,  failure  to  perform  this  same \noperation  was  a  nail  in  the  coffin  of  the  perceptron22 .  Rumelhart \net.  al.1  7  -18  revived  the  percept ron  by  adding  additional  layers  of \nneurons.  Although  doing  so  allowed  nonlinear  discrimination, \nthe \niterative  training  of  such  networks  can  be  painfully  slow.  With  the \naddition  of  a  hidden \nIn \ncontrast,  the  APNN  can  be  trained  by  looking  at  each  data  vector \nonly  once 1  \u2022 \n\nlikewise  generalizes. \n\nthe  APNN \n\nlayer, \n\nAlthough  neural  networks  will  not  likely  be  used  for  performing \nXOR's,  their  use  in  explaining  the  role  of  hidden  neurons  is  quite \ninstructive.  The  library  matrix  for  the  XOR  is \n\nf - [~  ~  ~ \n\n~ 1 \n\nThe  first  two  rOwS  of  F  do  not  form  a  matrix  of  full  column  rank. \nOur  approach  is  to  augment  fp  with \ntwo  more  rows  such  that  the \nresulting  matrix  is  full  rank.  Most  any  nonlinear  combination  of \nSuch \nthe  first  two  rowS  will  in  general  increase  the  matrix  rank. \na  procedure, \npossible \nlogical  \"AND\"  and \nnonlinear  operations \nrunning  a  weighted  sum  of \nthe  clamped  neural  states  through  a \nmemoryless  nonlinearity  such  as  a  sigmoid.  This  latter  alteration \nis  particularly  well  suited  to  neural  architectures. \n\ninclude  multiplication,  a \n\nin  ~-classifiers23 . \n\nfor  example, \n\nis  used \n\nTo  illustrate  with  the  exclusive  or  (XOR)  ,  a  new  hidden  neural \nstate  is  set  equal  to  the  exponentiation  of  the  sum  of  the  first \ntwo  rows.  A  second  hidden  neurons  will  be  assigned  a  value  equal  to \nthe  cosine  of  the  sum  of  the  first  two  neural  states  multiplied  by \nThe \nTt/2. \naugmented  library  matrix  is \n\nchoice  of  nonlinearities  here \n\nis  arbitrary. ) \n\n(The \n\n!:.+ \n\n0 \n0 \n1 \n1 \n0 \n\n0 \n1 \ne \n0 \n1 \n\n1 \n0 \ne \n0 \n1 \n\n1 \n1 \ne 2 \n-1 \n0 \n\n\f540 \n\nthe  states  of  the  hidden \nIn  either  the  training  or  look-up  mode, \nneurons  are  clamped  indirectly  as  a  result  of  clamping  the  input \nneurons. \n\nThe  playback  architecture  for  this  network  is  shown  in  Fig .1. \n\nThe  interconnect  values  for  the  dashed  lines  are  unity.  The  remain(cid:173)\ning  interconnects  are  from  the  projection  matrix  formed  from  !+. \n\nGeometrical  Interpretation \nthe  effects  of \nhidden  neurons  can  be  nicely  illustrated  geometrically.  Consider \nthe  library  matrix \n\nIn  lower  dimensions, \n\nF  = \n\n1 \n\n1/2  ] \n\nClearly  IP  =  (1/2  1) .  Let \nthe \ndetermined  by  the  nonlineariy  x 2 \nthe  first  row  of  f.  Then \n\nneurons \nwhere  x \n\nin  the  hidden  layer  be \ndenotes  the  elements  in \n\n!+ = [ t:  I t; ] \n\n= \n\n[  1/2 \n1i4 \n\n1;2 J \n\nThe  corresponding  geometry  is  shown  in  Fig. 2  for  x \n\nthe  input \nneuron,  y  the  output  and  h  the  hidden  neuron.  The  augmented  library \nvectors  are  shown  and  a  portion  of  the  generated  subspace  is  shown \nlightly  shaded.  The  surface  of  h  = x 2  resembles  a  cylindrical  lens  in \nthree  dimensions.  Note  that  the  linear  variety  corresponding  to  f = \nlens  and  subspace  only  at  1+. \n1/2 \nSimilarly,  the  x  =  1  plane  intersects  the  lens  and  subspace  at  12 \u2022 \nThus,  in  both  cases,  clamping  the  input  corresponding  to  the  first \nelement  of  one  of  the  two  library  vectors  uniquely  determines  the \nlibrary  vector. \n\nthe  cylindrical \n\nintersects \n\nConvergence  Improvement:  Use  of  additional  neurons  in  the  hidden \nlayer  will  improve  the  convergence  rate  of  the  APNN19 \u2022  Specifically, \nthe  spectral  radius  of  the  .!4  matrix  is  decreased  as  additional \nneurons \ncontrolling \nconvergence  is  thus  decreased. \n\nconstant \n\ndominant \n\nadded. \n\ntime \n\nare \n\nThe \n\nCapacity:  Under  the  assumption  that  nonlinearities  are  chosen  such \nthat  the  augmented  fp  matrix  is  of  full  rank,  the  number  of  vectors \nwhich  can  be  stored  in  the  layered  APNN  is  equal  to  the  sum  of  the \nnumber  of  neurons  in  the  input  and  hidden  layers.  Note,  then,  that \ninterconnects  between  the  input  and  output  neurons  are  not  needed \nif  there  are  a  sufficiently  large  number  of  neurons  in  the  hidden \nlayer. \n\n6.  GENERALIZATION \n\nWe  are  assured  that  the  APNN  will  converge \n\nto  the  desired \nresult  if  a  portion  of  a  training  vector  is  used  to  stimulate  the \nnetwork.  What,  however,  will  be  the  response  if  an  initialization \nis  used  that  is  not  in  the  training  set  or,  in  other  words,  how \ndoes  the  network  generalize  from  the  training  set? \n\nTo  illustrate  generalization,  we  return  to  the  XOR  problem.  Let \nS5  (M)  denote  the  state  of  the  output  neuron  at  the  Mth  (synchronous) \n\n\f541 \n\nloyer : \n\ninput \n\nhidden \n\n-\n\n-\n\n\"-\n\n\"-\n\n, \n\"\" \"  / \n, \n, \n3  exp \n\nX \n\n/ \n\n/ \n\n/ \n\nFigure  1.  Illustration  of  a \nlayered  APNN  fori performing \nan  XOR. \n\ny \n\nFigure  3.  Response  of  the \nelementary  XOR  APNN  using  an \nexponential  and  trignometric \nnonlinearity  in  the  hidden \nlayer.  Note  that,  at  the \ncorners,  the  function  is \nequal  to  the  XOR  of  the \n\nl( \n\nFigure  2.  A  geometrical \nillustration  of  the  use  of  an \nx 2  nonlinearity  to  determine \nthe  states  of  hidden  neurons. \n\nFigure  4.  The  generalization \nof  the  XOR  networks  formed  by \nthresholding  the  function  in \nFig . 3  at  3/4.  Different \nhidden  layer  nonlinearities \nresult  in  different \ngeneralizations. \n\n\f542 \n\niteration.  If  S1 \nthen \nS5  (m+1) =t1 5 Sl  +  t 25 S2  +  t35 S3  +  t4 5 S4  +  t5 5 S5  (m)  where  S3 =exp (Sl +S2  ) \nand  S4 =cos [1t (S1  +  S2) /2]  To  reach  steady  state,  we  let  m  tend  to \ninfinity  and  solve  for  S5  (~) : \n\ninput  clamped  value, \n\nand  S2  denote \n\nthe \n\n1 \n\nA  plot  of  S5  (~)  versus \n\nin  Figure  3.  The \nplot  goes  through  1  and  zero  according  to  the  XOR  of  the  corner \ncoordinates.  Thresholding  Figure \nthe \ngeneralization  perspective  plot  shown  in  Figure  4. \n\nis  shown \n\nresults \n\n(S1,S2) \n\nin \n\n3 \n\nat \n\n3/4 \n\nTo  analyze  the  network's  generalization  when \n\nthere  are  more \nthan  one  output  neuron,  we  use  (5)  of  which  (10)  is  a  special  case. \nIf  conditions  are  such  that  there  is  one  step  convergence, \nthen \ngeneralization  plots  of  the  type  in  Figure  4  can  be  computed  from \none  network  iteration  using  (7). \n\n7.  NOTES \n\n(a)  There  clearly  exists  a  great  amount  of  freedom  in  the  choice  of \n\nthe  nonlinearities  in  the  hidden  layer.  Their  effect  on  the \nnetwork  performance  is  currently  not  well  understood.  One  can \nenvision,  however,  choosing  nonlinearities  to  enhance  some \nnetwork  attribute  such  as  interconnect  reduction,  classification \nregion  shaping  (generalization)  or  convergence  acceleration. \n(b)  There  is  a  possibility  that  for  a  given  set  of  hidden  neuron \nnonlinearities,  augmentation  of  the  fp  matrix  coincidentally \nwill  result  in  a  matrix  of  deficent  column  rank,  proper \nconvergence  is  then  not  assured.  It  may  also  result  in  a  poorly \nconditioned  matrix,  convergence  will  then  be  quite  slow.  A \npractical  solution  to  these  problems  is  to  pad  the  hidden  layer \nwith  additional  neurons.  As  we  have  noted,  this  will  improve \nthe  convergence  rate. \n\n(c)  We  have  shown  in  section  4  that  if  an  APNN  has  a  single \n\nIf  there  are  a  sufficiently  large  number  of \n\nbipolar  output  neuron,  the  network  converges  in  one  step  in \nthe  sense  of  (7).  Visualize  a  layered  APNN  with  a  single \noutput  neuron. \nneurons  in  the  hidden  layer,  then  the  input  layer  does  not \nneed  to  be  connected  to  the  output  layer.  Consider  a  second \nneural  network  identical  to  the  first  in  the  input  and  hidden \nlayers  except  the  hidden  to  output  interconnects  are \ndifferent.  Since  the  two  networks  are  different  only  in  the \noutput  interconnects,  the  two  networks  can  be  combined  into  a \nsinglee  network  with  two  output  neurons.  The  interconnects \nfrom  the  hidden  layer  to  the  output  neurons  are  identical  to \nthose  used  in  the  single  output  neurons  architectures.  The  new \nnetwork  will  also  converge  in  one  step.  This  process  can \nclearly  be  extended  to  an  arbitrary  number  of  output  neurons. \n\nREFERENCES \n\n1.  R.J.  Marks  II,  \"A  Class  of  Continuous  Level  Associative  Memory \n\nNeural  Nets,\"  ~. Opt.,  vo1.26,  no.10,  p.200S,  1987. \n\n\f543 \n\n2.  K.F.  Cheung  et.  al.,  \"Neural  Net  Associative  Memories  Based  on \nConvex  Set  Projections,\"  Proc.  IEEE  1st  International  Conf.  on \nNeural  Networks,  San  Diego,  1987. \n\n3.  R.J.  Marks  II  et.  al.,  \"A  Class  of  Continuous  Level  Neural \nNets,\"  Proc.  14th  Congress  of  International  Commission  for \nOptics  Conf.,  Quebec,  Canada,  1987. \n\n4.  J.J.  Hopfield,  \"Neural  Networks  and  Physical  Systems  with \nEmergent  Collective  Computational  Abilities,\"  Proceedings \nNat.  Acad.  of  Sciences,  USA,  vol.79,  p.2554,  1982. \n\n5.  J.J.  Hopfield  et.  al.,  \"Neural  Computation  of  Decisions  in \n\n6. \n\nOptimization  Problem,\"  BioI.  Cyber.,  vol. 52,  p.141,  1985. \n\u00b7D. W.  Tank  et.  al.,  \"Simple  Neurel  Optimization  Networks:  an  AID \nConverter,  Signal  Decision  Circuit  and  a  Linear  Programming \nCircuit,\"  IEEE  Trans.  Cir.  ~., vol.  CAS-33,  p.533,  1986. \n7.  M.  Takeda  et.  ai,  \"Neural  Networks  for  Computation:  Number \n\nRepresentation  and  Programming  Complexity,\"  ~. Opt.,  vol. \n25,  no.  18,  p.3033,  1986. \n\n8.  S.  Geman  et.  al.,  \"Stochastic  Relaxation,  Gibb's  Distributions, \n\nand  the  Bayesian  Restoration  of  Images,\"  IEEE  Trans.  Pattern \nRecog.  & Machine  Intelligence.,  vol.  PAMI-6,  p.721,  1984. \n\n9.  S.  Kirkpatrick  et.  al.  ,\"Optimization  by  Simulated  Annealing,\" \n\nScience,  vol.  220,  no.  4598,  p.671,  1983. \n\n10.  D.H.  Ackley  et.  al.,  \"A  Learning  Algorithm  for  Boltzmann \n\nMachines,\"  Cognitive  Science,  vol.  9,  p.147,  1985. \n\n11.  K.F.  Cheung  et.  al.,  \"Synchronous  vs.  Asynchronous  Behaviour \n\nof  Hopfield's  CAM  Neural  Net,\"  to  appear  in  Applied  Optics. \n\n12.  R.P.  Lippmann,  \"An  Introduction  to  Computing  With  Neural  nets,\" \n\nIEEE  ASSP  Magazine,  p.7,  Apr  1987. \n\n13.  N.  Farhat  et.  al .. ,  \"Optical  Implementation  of  the  Hopfield \n\nModel,\"  ~. Opt.,  vol.  24,  pp.1469,  1985. \n\n14.  L.E.  Atlas,  \"Auditory  Coding  in  Higher  Centers  of  the  CNS,\" \nIEEE  Eng.  in  Medicine  and  Biology  Magazine,  p.29,  Jun  1987. \n\n15.  Y.S.  Abu-Mostafa  et.  al.,  \"Information  Capacity  of  the  Hopfield \n\nModel,  \"  IEEE  Trans.  Inf.  Theory,  vol.  IT-31,  p.461,  1985. \n\n16.  R.J.  McEliece  et.  al.,\"The  Capacity  of  the  Hopfield  Associative \n\nMemory,  \"  IEEE  Trans. \n\nInf.  Theory  (submitted),  1986. \n\n17.  D.E.  Rumelhart  et.  al.,  Parallel  Distributed  Prooessing,  vol.  I \n\n&  II,  Bradford  Books,  Cambridge,  MA,  1986. \n\n18.  D.E.  Rumelhart  et.  al.,  \"Learning  Representations  by  Back-Pro(cid:173)\n\npagation  Errors,\"  Nature.  vol.  323,  no.  6088,  p.533,  1986. \n\n19.  R.J.  Marks  II  et.  al.,\"Alternating  Projection  Neural  Networks,\" \n\nISDL  report  *11587,  Nov.  1987  (Submitted  for  publication) . \n\n20.  D.C.  Youla  et.  al,  \"Image  Restoration  by  the  Method  of  Convex \n\nProjections:  Part  I-Theory,\"  IEEE  Trans.  Med.  Imaging,  vol. \nMI-1,  p.81,  1982. \n\n21.  M. I.  Sezan  and  H.  Stark.  \"Image  Restoration  by  the  Method  of \n\nConvex  Projections:  Part  II-Applications  and  Numerical  Results,\" \nIEEE  Trans.  Med.  Imaging,  vol.  MI-1,  p.95,  1985. \n\n22.  M.  Minsky  et.  al.,  Perceptrons,  MIT  Press,  Cambridge,  MA,  1969. \n23.  J.  Sklansky  et.  al.,  Pattern  Classifiers  and  Trainable \n\nMachines,  Springer-Verlag,  New  York,  1981. \n\n\f", "award": [], "sourceid": 77, "authors": [{"given_name": "Robert", "family_name": "Marks", "institution": null}, {"given_name": "Les", "family_name": "Atlas", "institution": null}, {"given_name": "Seho", "family_name": "Oh", "institution": null}, {"given_name": "James", "family_name": "Ritcey", "institution": null}]}