{"title": "Learning Aspect Graph Representations from View Sequences", "book": "Advances in Neural Information Processing Systems", "page_first": 258, "page_last": 265, "abstract": null, "full_text": "258 \n\nSeibert and Waxman \n\nLearning  Aspect  Graph  Representations \n\nfrom  View  Sequences \n\nMichael  Seibert and Allen  M.  Waxnlan \n\nLincoln  Laborat.ory,  l\\IIassachusetts  Institute of Technology \n\nLexington,  MA  02173-9108 \n\nABSTRACT \n\nIn our effort to develop a modular neural system for invariant learn(cid:173)\ning and  recognition of 3D objects,  we  introduce here a new  module \narchitecture  called  an  aspect  network constructed  around  adaptive \naxo-axo-dendritic synapses.  This  builds  upon  our existing system \n(Seibert & Waxman, 1989) which  processes  20 shapes and classifies \nt.hem  into  view  categories  (i.e.,  aspects)  invariant  to  illumination, \nposition,  orientat.ion,  scale,  and  projective  deformations.  From  a \nsequence 'of  views,  the  aspect  network  learns  the  transitions  be(cid:173)\ntween  these  aspects,  crystallizing  a  graph-like  structure  from  an \ninitially  amorphous  network .  Object  recognition  emerges  by  ac(cid:173)\ncumulating evidence over  multiple views  which  activate competing \nobject  hypotheses. \n\nINTRODUCTION \n\n1 \nOne  can  \"learn\"  a  three-dimensional  object  by  exploring  it  and  noticing  how  its \nappearance  changes.  When  moving from  one  view  to  another,  intermediate  views \nare presented .  The imagery is  continuous,  unless some feature of the object appears \nor  disappears  at the object's  \"horizon\"  (called  the  occluding contour).  Such  visual \n(vents  can  be  used  to  partition  continuously  varying  input  imagery  into  a  discrete \nsequence of a.-,pects.  The sequence of aspects (and the transitions between them) can \nbe coded  and organized  into a  representation  of the 3D object  under consideration. \nThis  is  the  form  of 3D object  representation  that is  learned  by  our  aspect  network. \n\\Ve  call  it  an  aspect  network  because  it  was  inspired  by  the  aspect  graph  concept \nof  Koenderink  and  van  Doorn  (1979).  This  paper  introduces  this  new  network \n\n\fLearning Aspect Graph Representations from View Sequences \n\n259 \n\nwhich  learns  and  recognizes  sequences  of aspf'cl.s,  and  leaves  most of t.he  discussion \nof t.he  visual  preprocessing  to earlier  papers  (Seibert  &:  Waxman,  1989;  Waxman. \nSeihf'rt,  Cunningham, &  \\\\Tu,  1989).  Prt'sent.ed  ill  this  way,  we  hope  that our ideas \nof sequence  learning,  representation,  and  recognition  are also  useful  to  investigators \nconcerned  with speech,  finite-state  machines,  planning, and  cont.rol. \n\n1.1  2D  VISION  BEFORE  3D  VISION \n\nThe  aspect  network  is  one  module  of  a  more  complete  VIsIOn  system  (Figure  1) \nint.roduced  by  us  (Seibert  &  vVaxman,  198~) .  The  early  st.ages  of  the  complete \nsystem  learn  and  recognize  2D  views  of objects,  invariant  to  t.he  scene  illumina-\n\n~nizecl ~ \n\n.. , \n\n'E \n\n--.. , \n\" , , , , . ,  c \n\" , . \n\" , . , \" , . , \n\n\" , , \n\n. \n,  , \n, \n\n---;----\n\n, \n, , \n\n111- Codin9  8nd \n\nO.loIm.tlon  1n .... ri8l1c:e \u2022 ...-.o:zzd.1::1CK:1O. \n\nOr\"\" IIII10n  8I'Id \n\nF \u2022\u2022 lIr. \n\nConlr .. t \n\nInput \n\nFigure  1:  Neural  system  architecture  jor 3D  object  learning  and  recognition.  The \naspect  network  is  part of t.ht>  upper-right.  module. \n\ntion  and  a.n  object 's  orientat.ion,  size,  and  position  in  the  visual  field.  Additionally, \nprojective  deformat.ions such  as  foreshortening  and  perspective  effects  are  removed \nfrom  the learned  2D  representations.  These  processing steps  make use  of Diffusion(cid:173)\nEnhancement Bilayers (DEBs)l  to generate att.entional cues and featural groupings. \nThe  point  of our  neural  preprocessing  is  to  generate  a  sequence  of views  (i.e.,  as(cid:173)\npects)  which  depends  on  t.he  object's  orient.ation  in  3-space,  but  which  does  not \ndepend  on  how  the  2D  images  happen  to  fall  on  the  retina.  If no  preprocessing \nwere  done,  then  t.he  :3D  represent.ation  would  have  to  account  for  every  possible \n2D  appearance  in  adJition  to  the  3D  informat.ion  which  relates  the  views  to  each \nother.  Compressing  the views  into aspects  avoids such  combinatorial problems, but \nmay  result  in  an  ambiguous  representation,  in  that  some  aspects  may  be  common \nto a  number of objects.  Such  ambiguity  is  overcome  by  learning  and  recognizing  a \n\nIThis architecture  was  previously  called  the  NADEL  (Neural  Analog  Diffusion-Enhancement \nLayer), but has been renamed to avoid causing any  problems or confusion, since there is  an active \nresearcher in t.he  field  wit h  this  name. \n\n\f260 \n\nSeibert and Waxman \n\nseque11ce  of aspect.s  (i.e.,  a  tr'ajectory  t.hrough  the  aspect  graph).  The  partitioning \nand  sequence  recognition  is  analogous  t.o  building  a  symbol  alphabet  and  learning \nsyntactic structures  within  the  alphabet ..  Each  symbol  represent.s  all  aspect.  and  is \nencoded  in  ollr syst.em  as  a  separate  category  by  an  Adapt.ive  Resonance  Network \narchitecture (Carpenter & Grossberg,  1987).  This unsupervised  learning is  compet(cid:173)\nitive  and  may  proceed  on-line  with  recognition;  no separate  training is  required . \n\n1.2  ASPECT  Gn.APHS  AND  ODJECT  REPRESENTATIONS \n\nFigure  2  shows  a  simplified  aspect  graph  for  a  prismatic  object. 2  Each  node  of \n\n..... :.:.:.:::::.:.:.: .. : ... \n\n.. I \nI \n\n, ........ . \n\n\" \n\nFigure 2:  Aspect  Graph.  A  3D object  can  be  represented  as  a  graph  of the  char(cid:173)\nacteristic  view-nodes  with  adjacent  views  encoded  by  arcs  bet\\ ... een  the  nodes. \n\nthe  graph  represents  a  characteristic  view,  while  the  allowable  t.ransitions  among \nviews  are represented  by  the  arcs  between  the  nodes .  In  this depiction, symmetries \nhave  been  considered  to simplify  the graph.  Although  Koenderink  and  van  Doorn \nsuggested  assigning  aspects  based on  topological equivalences,  we  instead  allow  the \nART 2 portion of our 2D system to decide  when  an  invariant 2D  view  is sufficiently \ndifferent from previously experienced  views to allocate a new  view category (aspect). \n\nTransitions  between  adjacent  aspects  provide  the  key  to  the  aspect  net.work  rep(cid:173)\nresentation  and  recognition  processes.  Storing  the  transitions  in  a  self-organizing \nsyna.ptic weight array becomes the learned view-based representation of a 3D object. \nTransitions are exploited again during recognition to distinguish among objects with \nsimilar views.  Whereas  most investigators are interest.ed  in  the computational com(cid:173)\nplexity of generating aspect  graphs from  CAD libral\u00b7ies  (Bowyer,  Eggert, Stewman, \n\n2Neither the aspect graph concept nor our aspect network implementat.ion is  limited  to simple \n\npolyhedral objects, nor must  the objects even be convex, i.e.,  they may  be self-occluding. \n\n\fLearning Aspect Graph Representations from View Sequences \n\n261 \n\n&  St.ark,  1989),  we  are interest.ed  ill  designing it as  a self-organizing represent-at ion, \nlearned  from  visual  experience  and  useful  for  object  recognition. \n\n2  ASPECT-NETWORK  LEARNING \n\nThe view-category nodes of ART 2 excite the aspect nodes  (which we  a.lso call  the;1;(cid:173)\nnodes)  of t.he  aspect  network  (Figure 3).  The aspect  nodes  fan-out  to  the dendritic \n\nObject \n\nCompetition  Layer \n\nAccumulation \nNode. \n\nSynaptic  Array.  of \n\nLearned  Vie. \nTran.IUon. \n\n~~J = 0 \n__  ~,  \u2022  1 \n\nAspect  Nod .. \n\nInput  View  Categorlea \n\nVie.  Tran.IUon \n\n3 \n\n12M \nhr::: di!I \n\nN\u00b71 \n\nFigure 3:  Aspect  Network.  The learned graph representations of 3D objects are re(cid:173)\nalized  as  weights in  the synaptic arrays.  Evidence for  experienced  view-trajectories \nis  simulta.neously  accumulated  for  all  competing objec.ts. \n\ntrees  of object  neurons.  An  object  neuron  consists  of an  adaptive  synaptic  array \nand  an  evidence  accumulating  y-node.  Each  object  is  learned  by  a  single  object \nneuron.  A  view  sequence  leads  to  accumulating  activit.y  in  the  y-nodes,  which \ncompete  to  determine  the  \"recognized  object\"  (i.e.,  maximally  active  z-node)  in \nthe  \"object  competition  layer\".  Gating  signals  from  these  nodes  then  modulate \nlearning in  the corresponding synaptic array,  as  in  competitive learning paradigms. \nThe  system  is  designed  so  that  the  learning  phase  is  integral  with  recognition. \nLearning  (and  forgetting)  is  always  possible  so  that  existing  representations  can \na.lways  be elaborated  with  new  information as  it becomes  available. \n\nDifferential equations govern  the dynamics and  architecture of the aspect  network. \nThese shunting equations model cell  membrane and synapse dynamics as  pioneered \nby  Grossberg  (1973,  1989).  Input  activities  to  the  network  are  given  by  equation \n(1),  the learned  aspect  transitions by  equation (2),  and  the objects recognized  from \nthe experienced  view sequences  by  equation  (3). \n\n\f262 \n\nSeibert and Waxman \n\n2.1  ASPECT  NODE  DYNAMICS \n\nThe aspect  node  activities  are governed  by  equation  (1): \n\n. \n\ndXi \ndt ==  Xj  =  Ii  - .AxXi, \n\n(1) \nwhere  .Ax  is  a  passive  decay  rate,  and  Ii  = 1  during  the  presentation  of  aspect \ni  and  zero  otherwise  as  determined  by  the  output  of the  ART  2  module  in  the \ncomplete system  (Figure  1).  This equat.ion  assures  t.hat  the activities of the  aspect \nnodes build and decay in  nonzero time (see  the timet-races for  the input I-nodes and \naspect  x-nodes  in  Figure 3).  Whenever  an  aspect  transition occurs,  the  activity of \nthe  previous  aspect  decays  (with  rate  .Ax)  and  the activity of the  new  aspect  builds \n(again with  rate .Ax  in  this ca.<;e,  which  is  convenient but not necessary).  During the \ntransient  time  when  both  activities  are  nonzero,  only  the  synapses  between  these \nnodes  have  both  pre- and  post-synaptic  activities  which  are  significant  (Le.,  above \nthe t.hreshold)  and Hebbian learning can  be supported.  The overlap of the pre- and \npost-synaptic activities is  transient,  and  the extent  of the transient  is controlled  by \nthe selection  of .Ax.  This is  the fundamental  parameter for  the  dynamical behavior \nof the entire  network, since  it defines  the response  time of the aspect  nodes to  their \ninputs.  As  such,  nearly  every other  parameter of the  network  depends on it. \n\n2.2  VIEW  TRANSITION  ENCODING  BY  ADAPTIVE SYNAPSES \n\nThe  aspect  transitions  that  represent  objects  are  realized  by  synaptic  weights  on \nthe  dendritc  trees  of object  neurons.  Equation  (2)  defines  how  the  (initially small \nand  random)  weight  relating aspect  i,  aspect  j, and object  k  changes: \n\nk \n\nk \n\n. \n\n.t \n\n. k \n\n.Aw} 8 Y(Yk)8 z (Zk)' \n\ndtv~  _ \n-d- = tvij  = \"'w  tvij  (1- tvij)  {<l>w  [(Xi  + f)(Xj  + f)] -\n\n(2) \nHere, \"'w  governs the rate of evolution of the weights relative to the x-node dynamics, \nand  .A w  is  the  decay  rate  of t.he  weights.  Note  that  a  small  \"background  level\"  of \nactivity  f  is  added  to  each  x-node  activity.  This  will  be  discussed  in  connection \nwith  (3)  below.  <l>\u00a2>(-r)  is  a  threshold-linear function;  that is:  <I>\u00a2>(-y)  = 'Y  if'Y  > \u00a2>th \nand zero  otherwise.  8 8 ( 'Y)  is  a  binary-t.hreshold  function  of the absolute-value of ,; \nthat  is:  8 8 (-r)  = 1.0  if I, I> 8th  and  zero  otherwise. \nAlthough this equation appears formidable, it.  can be understood  as follows.  When(cid:173)\never  simultaneous  above-threshold  activities  arise  presynaptically  at  node  Xi  and \npostsynaptically at node  xi,  the  Hebbian  product  (Xi + f) (Xj  + f)  causes  wfj  to  be \npositive (since above threshold,  (Xi + f)(Xj + f)  > .Aw )  and  the weight wfj  learns the \ntransition  between  the  aspects  Xi  and  Xj.  By  symmetry,  Wri  would  also  learn,  but \nall ot.her  weight.s decay  (tV  ex:  -.A w ).  The product of the shunting terms wfj(l-w~) \ngoes  to  zero  (and  thus  inhibits  further  weight  changes)  only  when wt;  approaches \neither zero  or  unit.y.  This shunting  mechanism  limit.s  the  range of weights,  but  also \nassures  that  these  fixed  points  are  invariant  to  input-activity  magnitudes,  decay(cid:173)\nrates,  or  the  initia.l and  final  network  sizes. \n\n\fLearning Aspect Graph Representations from View Sequences \n\n263 \n\nThe  gat.ing  t.erms  0 y UiA')  and  e z (=d  modulate  the  leCl ruing of the synaptic  arrays \nw~ .  As  a  result  of compet.it.ion  between  multiple object  hypot.heses  (see  equat.ion \n(4)  helow),  only one  =k-node  is  active at a  time .  This implies recognition  (or  initial \nobject neuron assignment.) of \"Object.-k,\"  and so only the synaptic array ofObject-k \nadapts.  All other syna.pt.ic  arrays w!j  (I  :f.  k)  remain unchanged.  Moreover, learning \noccurs  only  during  aspect.  transitions.  \\Vhile  Yk  :f.  0  both  learning  and  forgetting \nproceed;  bllt  while  .III.: \n:::::::  0  a.dapt.at.ion  ceases  t.hough  recognition  continues  (e.g. \nduring  a  10llg  sust.ained  view). \n\n2.3  OBJECT  RECOGNITION  DYNAMICS \n\nObject  nodes  Yk  accumulate evidence  over  time .  Their dynamics are governed  by: \n\nHere,  I\\.y  governs  the  rate  of evolution  of the  object  nodes  relative  to  the  x-node \ndynamics, Ay  is the passive decay rate of the object nodes,  <l>y (.) is a threshold-linear \nfunction,  and  f  is the same small positive constant as  in  (2).  The same Hebbian-like \nproduct (i.e., (Xi+E) (Xj  +f)) used  to leam transitions in  (2)  is  used  to detect  aspect \ntransitions  during  recognition  in  (3)  with  the  addition  of t.he  synaptic  term  wfj' \nwhich  produces  an  axo-axo-dendritic synapse  (see  Section  3).  Using  this synapse, \nan aspect  transition  must  not only be detected,  but it must also be  a  permitted one \nfor  Object-k  (i .e.,  lV~ > 0)  if it  is  t.o  contribute  activity  to  the Yk-node . \n\n2.4  SELECTING  THE  MAXIMALLY  ACTIVATED  OBJECT \n\nA  \"winner-take-all\"  competition is  used  to select  the maximally active object node. \nThe  activity  of  each  evidence  accumulation  y-node  is  periodically  sampled  by  a. \ncorresponding object competition z-node (see  Figure 3).  The sampled a.ctivities then \ncompete  according  to  Grossberg's  shunted  short-term  memory  model  (Grossberg, \n1973),  leaving only  one  z-node  active  at  the  expense  of t.he  activities  of the  other \nz-nodes.  In  addition  to signifying  the  'recognized'  object,  outputs  of the  z-nodes \nare used  to inhibit weight adaptation of those weights which are not associated  with \nthe  winning object  via  t.he  0 z (zd  term  in  equation  (2).  The  competition  is  given \nby  a  first-order  differential  equation  taken from  (Grossberg,  1973): \n\n(4) \n\nThe  function  J(z)  is  chosen  to  be  faster-than-linear  (e.g.  quadratic).  The  initial \nconditions  are  reset  periodically  to  zk(O)  = Yk(t). \n\n3  THE AXO-AXO-DENDRITIC  SYNAPSE \nAlthough  the  learning is  very  closely  Hebbian,  the network  requires  a synapse that \nis  more  complex  than  that  typically  analyzed  in  the  current  modeling  literature. \n\n\f264 \n\nSeibert and Waxman \n\nInstead of an axo-delldrit.ic  synapse,  we  utilize  all  (/J'o-(txo-dctldritic  synapse (Shep(cid:173)\nard,  1979),  Figure  4  illllst.rat.es  t.he  synaptic  alli\\(omy  and  our  functional  model. \nWe  interpret  the  ~t.ruct.ure  by  assuming  t.hat  it  is  (he  conjullct.ioll  of activities  in \n\nFigure 4:  Axo-axo-dendritic  Synapse  Model.  The  Hebbian-like  wfrweight  adapt.s \nwhen  simultaneous  axonal  activities  Xi  and  Xj  arise.  Similarly,  a  conjunction  of \nboth  activities  is  necessary  to significantly st.imulat.e  the dendrite  to node  Yk. \n\nboth  axons  (as  during  an  aspect  transition)  that  best  stimulates  the  dendrite.  If, \nhowever,  significant activity is  present on only one axon (a sustained static view), it \ncan stimulate the dendrite to a small extent in conjullction with the small base-level \nactivity  (  present  on  a.1I  axons.  This property  supports object  recognition  in static \nscenes,  though object  learning requires  dynamic scenes. \n\n4  SAMPLE  RESULTS \n\nConsider  two  objects  composed  of three  aspects  ea.ch  with one  aspect  in  common: \nthe first  has aspects  0,  2,  and  4,  while  the second  has  aspects  0,  1,  and  3.  Figure 5 \nshows the evolut.ion of the node activities and some of the weights during two aspect \nsequences. \n\\Vith  an  initial  distribution  of small,  random  weights,  we  present  the \nrepetitive  aspect  sequence  4  -+  2  -+  0  -+  \"', and  learning  is  engaged  by  Object-\n1.  The  attention  of the system  is  then  redirected  with  a  saccadic  eye  motion  (the \nshort-term  memory  node  activities  are  reset  to  zero)  and  a  new  repetitive  aspect \nsequence  is  presented:  3  -+  1  -\n0  -+  ....  Since  the  weights  for  these  aspect \ntransitions  in  the  Object-!  synaptic  array  decayed  as  it  learned  its  sequence,  it \ndoes  not  respond strongly to this new  sequence and  Object-2  wins the competition. \nThus,  the  second  sequence  is  learned  (and  recognized!)  by  Object-2's  synaptic \nweight  array.  In  these  simulations  (1)  - (4)  were  implemented  by  a  Runge-Kutta \ncoupled  differential equation integrator.  Each  aspect.  was presented  for  T  = 4 time(cid:173)\nunits.  The  equation  parameters  were  set  as  follows:  I  = 1,  Ax  ~ In(O.I)/T,  Ay  ~ \n0.3,  Aw  ~ 0.02,  Ky  ~ 0.3,  Kw  ~ 0.6,  (  ~ 0.03,  and  thresholds  of 8y  ~ 10- 5  for \n8 y(Yd  in  equation  (2),  8z  ~ 10- 5  for  8 z (zt)  in  equation  (2),  \u00a2y  >  (2  for  <I>y  in \nequation  (3),  \u00a2w  >  max[\u00a3l/Ax+{2,  (I/Ax)2exp(-AxT)]  for  <I>w  in  equation  (2). \nThe  \u00a2w  constraint  insures  that  only  transitions  are  learned,  and  they  are  learned \nonly  when  t  < T. \n\n\fLearning Aspect Graph Representations from View Sequences \n\n265 \n\nVIEW  4\u00b72\u00b70\u00b7 ... \n\nVIEW  3+0\u00b7 \u2022.\u2022 \n\nASPECT SEQUENCE \n\nOBJECT-1  EVIDENCE \n\nOBJECT-2  EVIDENCE \n\nOBJECT-1  WEIGHT  0-1 \n\nOBJECT-1  WEIGHT  0-2 \n\nOBJECT-2  WEIGHT  0-1 \n\nOBJECT-2  WEIGHT  0-2 \n\nFigure 5:  Node  activity  and  synapse  adaptation  vs.  time.  Two separate  represen(cid:173)\ntations are learned automatically as  aspect sequences of the objects are experienced. \n\nAcknowledgments \n\nThis  report  is  based  on  studies  performed  at  Lincoln  Laboratory,  a  center  for  re(cid:173)\nsearch  operated  by  the  Massachusetts Instit.ute of Technology.  The work  was spon(cid:173)\nsored  by  the  Department of t.he  Ail'  Force  under  Contract  F19628-85-C-0002. \n\nReferences \n\nBowyer,  K.,  Eggert,  D.,  Stewman,  J.,  &  Stark,  L.  (1989).  Developing  the  aspect \ngraph representation for use in  image understanding.  Proceedings  of the  1989 Image \nUnderstanding  WOT\u00b7kshop.  'Vash.  DC:  DARPA.  831-849. \n\nCarpenter,  G.  A.,  &  Grossberg,  S.  (1987).  ART  2:  Self-organization  of stable \ncategory  recognition  codes for  analog input patterns.  Applied Optics,  26(23), 4919-\n4930 . \n\nGrossberg,  S.  (1973).  Contour enhancement,  short  term  memory,  and  constancies \nin  reverberating  neural  netv,,\u00b7orks.  Studies  in  Applied  Mathematics,  52(3), 217-257. \nKoenderink,  J.  J.,  &.  van  Doorn,  A. J.  (1979).  The  internal  representation  of solid \nshape  with  respect  to  vision.  Biological  Cybernetics,  32, 211-216. \n\nSeibert,  M.,  Waxman, A.  M. (1989).  Spreading Activation  Layers,  Visual Saccades, \nand  Invariant  Representations  for  Neural  Pattern  Recognition  Systems.  Ne1tral \nNetworks .  2(1).  9-27 . \n\nShepard,  G .  M.  (1979).  The  synaptic organization of the  brain.  New  York: \nOxford  University  Press. \nWaxman,  A.  M.,  Seibert,  M.,  Cunningham,  R.,  &  Wu,  J.  (1989).  Neural  analog \ndiffusion-enhancement  layer  and spatio-temporal grouping in early  vision.  In:  Ad(cid:173)\nvances  in neural inforll1ation processing systems, D.  S. Touretzky  (ed.), San \nMateo,  CA:  Morgan  Kaufman.  289-296. \n\n\f", "award": [], "sourceid": 230, "authors": [{"given_name": "Michael", "family_name": "Seibert", "institution": null}, {"given_name": "Allen", "family_name": "Waxman", "institution": null}]}