{"title": "A Principle for Unsupervised Hierarchical Decomposition of Visual Scenes", "book": "Advances in Neural Information Processing Systems", "page_first": 52, "page_last": 58, "abstract": null, "full_text": "A Principle for Unsupervised \n\nHierarchical Decomposition of Visual Scenes \n\nMichael C. Mozer \n\nDept.  of Computer Science \n\nUniversity of Colorado \n\nBoulder,  CO  80309-0430 \n\nABSTRACT \n\nStructure in  a visual  scene can  be  described at many levels of granular(cid:173)\nity.  At a coarse level,  the  scene is  composed of objects;  at  a finer level, \neach object is made up of parts, and the parts of subparts. In this  work, I \npropose a simple principle by  which  such hierarchical  structure can  be \nextracted from  visual scenes: Regularity in the relations among different \nparts of an  object is  weaker than in  the  internal  structure of a part. This \nprinciple  can  be  applied  recursively  to  define  part-whole  relationships \namong elements  in  a scene.  The principle does  not make use  of object \nmodels,  categories,  or  other  sorts  of  higher-level  knowledge;  rather, \npart-whole relationships  can  be  established based  on  the  statistics  of a \nset of sample visual  scenes. I illustrate with a model that performs unsu(cid:173)\npervised  decomposition  of simple  scenes.  The  model  can  account  for \nthe  results from  a human  learning experiment on  the  ontogeny  of part(cid:173)\nwhole relationships. \n\n1  INTRODUCTION \n\nThe structure in  a visual  scene can  be described at many  levels  of granularity.  Con(cid:173)\n\nsider the  scene in  Figure  I a.  At a coarse level,  the  scene might be  said to consist of stick \nman and stick dog.  However,  stick man and stick dog themselves can be decomposed fur(cid:173)\nther.  One  might describe  stick  man  as  having  two  components,  a  head  and  a  body.  The \nhead in turn can be described in  terms of its parts:  the eyes,  nose,  and mouth. This  sort of \nscene decomposition can continue recursively down to the level of the primitive visual fea(cid:173)\ntures. Figure  I b shows a partial decomposition of the scene in Figure  I a. \n\nA  scene  decomposition  establishes  part-whole  relationships  among  objects.  For \nexample,  the  mouth (a  whole)  consists  of two parts,  the  teeth and  the  lips.  If we  assume \nthat  any  part  can  belong  to  only  one  whole,  the  decomposition  imposes  a  hierarchical \nstructure over the elements in the scene. \n\nWhere does this  structure come from? What makes an object an object, a part a part? \nI  propose  a simple principle  by  which  such  hierarchical  structure  can  be extracted from \nvisual  scenes  and  incorporate  the  principle in  a simulation model. The principle is  based \non  the  statistics of the  visual environment, not on  object models or other sorts  of higher(cid:173)\nlevel knowledge, or on a teacher to classify objects or their parts. \n\n\fHierarchical Decomposition of Visual Scenes \n\n53 \n\n2  WHAT MAKES A PART A PART? \n\nParts combine to form objects. Parts are combined in different ways to form different \nobjects and different instances of an  object.  Consequently,  the  structural relations among \ndifferent parts of an object are less regular than is  the internal structure of a part. To  illus(cid:173)\ntrate,  consider Figure 2,  which depicts four  instances of a  box  shell  and  lid.  The compo(cid:173)\nnents  of the  lid-the top  and  the  handle-appear in  a  regular  configuration,  as  do  the \ncomponents of the shell-the sides and base-but the relation of the lid to the shell is  vari(cid:173)\nable.  Thus,  configural  regularity  is  an  indication  that  components  should  be  grouped \ntogether to  form  a unit.  I call  this the  regularity principle. Other variants of the regularity \nprinciple have been suggested by Becker (1995) and Tenenbaum (1994). \n\nThe regularity  depicted  in  Figure 2  is  quite  rigid:  one  component of a  part  always \noccurs  in  a fixed  spatial  position relative  to  another.  The regularity  principle can  also  be \ncast in terms of abstract relationships such as containment and encirclement. The only dif(cid:173)\nference  is  the  featural  representation  that  subserves  the  regularity  discovery  process.  In \nthis paper, however, I address primarily regularities that are based on physical features and \nfixed  spatial relationships. Another generalization of the regularity  principle is  that it can \nbe applied recursively to suggest not only parts of wholes, but subparts of parts. \n\nAccording to the regularity principle, information is  implicit in the environment that \ncan  be  used  to  establish part-whole relationships. This information  comes  in  the form  of \nstatistical  regularities  among features  in a  visual  scene.  The regularity principle does  not \ndepend on explicit labeling of parts or objects. \n\nIn contrast, Schyns and Murphy (1992,  1993) have suggested a theory of part ontog(cid:173)\neny that presupposes explicit categorization of objects. They propose a homogeneity prin(cid:173)\nciple  which  states  that  \"if  a  fragment  of  a  stimulus  plays  a  consistent  role  in \ncategorization,  the  perceptual  parts  composing  the  fragment  are  instantiated  as  a  single \nunit in  the  stimulus representation  in  memory.\"  Their empirical  studies  with  human  sub(cid:173)\njects find  support for the homogeneity principle. \n\nSuperficially,  the  homogeneity  and  regularity  principles  seem quite different:  while \nthe  homogeneity  principle applies  to supervised category  learning  (i.e.,  with  a  teacher to \nclassify instances), the regularity principle applies to unsupervised discovery. But it is pos(cid:173)\nsible to transform one learning paradigm into the other.  For example,  in  a category learn(cid:173)\ning task,  if only one category is  to  be learned and if the training examples are  all  positive \ninstances  of the  category,  then  inducing  the  defining  characteristics  of the  category  is \nequivalent to  extracting regularities in the  stimulus environment.  Thus, category learning \nin  a  diverse  stimulus  environment  can  be  conceptualized  as  unsupervised  regularity \nextraction in multiple, narrow stimulus environments (each environment being formed  by \ntaking all positive instances of a given class). \n\n(a) \n\n(b) \n\nscene \n\n~ stick man \n\nstick dog \n\n~ head \n\nbody \n\n~ ~ \n\ntorso \n\nleg \n\neyes  nose  mouth \n\narm \n\nFIGURE 1. (a) A graphical depiction of stick man and his faithful companion, stick dog; (b) a \npartial decomposition of the scene into its parts. \n\n~ lips \n\nteeth \n\nFIGURE 2. Four different instances of a box with a lid \n\n\f54 \n\nM.  C.  Mozer \n\nThere  are  several  other differences  between  the  regularity  principle  proposed  here \nand  the  homogeneity  principle  of Schyns  and  Murphy,  but  they  are  minor.  Schyns  and \nMurphy  seem  to  interpret  \"fragment\"  more  narrowly  as  spatially  contiguous  perceptual \nfeatures.  They  also  don't  address  the  hierarchical  nature  of  part-whole  relationships. \nNonetheless,  the  two  principles  share  the  notion  of using  the  statistical  structure  of the \nvisual environment to establish part-whole relations. \n\n3  A FLAT REPRESENTATION OF STRUCTURE \n\nI  have  incorporated  the  regularity  principle  into  a  neural  net  that  discovers  part(cid:173)\n\nwhole relations  in  its  environment.  Neural  nets,  having powerful  learning  paradigms  for \nunsupervised  discovery,  are  well  suited  for  this  task.  However,  they  have  a  fundamental \ndifficulty representing complex, articulated data structures of the sort necessary to encode \nhierarchies  (but see Pollack,  1988, and Smolensky,  1990, for promising advances).  I  thus \nbegin  by  describing  a  novel  representation  scheme  for  hierarchical  structures  that  can \nreadily be integrated into a neural net. \n\nThe tree structure in Figure  I b depicts one representation of a hierarchical decompo(cid:173)\n\nsition.  The complete  tree  has  as  its  leaf nodes  the  primitive  visual  features  of the  scene. \nThe tree specifies the relationships among the visual features. There is another way of cap(cid:173)\nturing  these  relationships,  more connectionist in  spirit than  the  tree structure.  The idea is \nto  assign  to  each  primitive  feature  a  tag-a scalar in  [0,  I)-such that features  within  a \nsubtree have similar values. For the features of stick man, possible tags might be:  eyes  .1, \nnose .2,  lips .28, teeth .32, arm .6, torso .7,  leg  .8. \n\nDenoting the set of all  features  having tags in  [a, ~] by Sea,  ~), one can specify any \nsubtree of the  stick man representation. For example, S(O, 1)  includes all  features  of stick \nman; S(0,.5) includes all features  in the subtree whose root is  stick man's head, S(.5,I) his \nbody;  S(.25,.35)  indicates  the  parts  of the  mouth.  By  a  simple  algorithm,  tags  can  be \nassigned to the leaf nodes of any  tree such that any  subtree can be selected by  specifying \nan  appropriate  tag  range.  The  only  requirement  for  this  algorithm  is  knowledge  of the \nmaximum branching factor.  There is no fixed limit to the depth of the tree that can be thus \nrepresented; however, the deeper the tree, the finer the tag resolution that wiII  be  needed. \n\nThe  tags  provide  a  \"flat\"  way  of representing  hierarchical  structure.  Although  the \ntree is  implicit in  the representation,  the  tags  convey  all  information  in  the  tree,  and thus \ncan  capture  complex,  articulated  structures.  The  tags  in  fact  convey  additional  informa(cid:173)\ntion. For example in the above feature  list, note that lips is closer to  nose than teeth is  to \nnose. This information can easily  be  ignored,  but  it is  still  worth  observing that the  tags \ncarry extra baggage not present in  the symbolic tree structure. \n\nIt is  convenient to represent the tags on a range [0, 21t)  rather than  [0, I]. This allows \nthe  tag to be identified  with a directional-or angular-value. Viewed as  part of a cyclic \ncontinuum, the directional tags are homogeneous, in contrast to the linear tags where tags \n\nnear \u00b0 and  1 have  special status by  virtue of being at endpoints of the continuum.  Homo(cid:173)\n\ngeneity results in a more elegant model, as described below. \n\nThe directional  tags  also permit a  neurophysiological  interpretation,  albeit specula(cid:173)\n\ntive.  It has  been suggested  that  synchronized  oscillatory  activities  in  the  nervous  system \ncan be used to  convey information above and  beyond that contained in  the average firing \nrate of individual neurons (e.g., Eckhorn et aI.,  1988; Gray et aI.,  1989; von der Malsburg, \n1981).  These osciIIations  vary  in  their phase,  the  relative  offset of the  bursts.  The direc(cid:173)\ntional tags  could map directly to  phases  of oscillations, providing a means of implement(cid:173)\ning the tagging in neocortex. \n\n4  REGULARITY DISCOVERY \n\nMany  learning  paradigms  allow  for  the  djscovery  of  regUlarity.  I  have  used  an \nautoencoder architecture  (Plaut,  Nowlan,  &  Hinton,  1986) that maps an  input pattern-a \n\n\fHierarchical Decomposition of Visual Scenes \n\n55 \n\nrepresentation of visual features  in  a scene-to an  output pattern  via a small layer of hid(cid:173)\nden  units.  The  goal  of this  type  of architecture  is  for  the  network to  reproduce  the  input \npattern over the  output units.  The task requires discovery  of regularities  because the  hid(cid:173)\nden layer serves as  an encoding bottleneck that limits the  representational capacity of the \nsystem.  Consequently,  stronger regularities  (the most common  patterns)  will  be  encoded \nover the weaker. \n\n5  MAGIC \n\nWe  now  need  to  combine  the  autoencoder architecture with  the  notion  of tags  such \nthat regularity  of feature  configurations  in  the  input will  increase  the  likelihood  that the \nfeatures  will  be assigned the same tags. \n\nThis goal can be  achieved using a model we developed for segmenting an image into \n\ndifferent  objects  using  supervised  learning.  The  model,  MAGIC  (Mozer,  Zemel,  Behr(cid:173)\nmann, &  Williams,  1992), was  trained on images containing several visual  objects and its \ntask was  to  tag features  according to  which object they  belonged.  A teacher provided the \ntarget tags. Each unit in MAGIC conveys two distinct values:  a probability that a feature is \npresent, which I will call the feature activity, and a tag associated with the feature. The tag \nis  a directional  (angular) value,  of the  sort suggested earlier.  (The  tag  representation is  in \nreality a complex number whose direction corresponds to the directional value and whose \nmagnitude  is  related  to  the  unit's  confidence  in  the  direction.  As  this  latter aspect of the \nrepresentation is not central to the present work, I discuss it no further.) \n\nThe architecture is  a two layer recurrent net.  The input or feature  layer is  set of spa(cid:173)\n\ntiotopic  arrays-in  most  simulations  having  dimensions  25x25-each  array  containing \ndetectors for features  of a given  type:  oriented line segments  at 0 0 ,45 0 ,900\n\u2022  In \naddition,  there  is  a  layer  of hidden  units.  Each  hidden  unit  is  reciprocally  connected to \ninput from a local spatial patch of the input array; in the current simulations, the patch has \ndimensions  4x4. For each patch  there is  a corresponding fixed-size pool of hidden  units. \nTo  achieve  a translation  invariant response  across  the  image,  the  pools  are  arranged  in  a \nspatiotopic  array  in  which  neighboring  pools  respond  to  neighboring  patches  and  the \npatch-to-pool weights are constrained to be the same at all locations in the array. There are \ninterlayer connections, but no intralayer connections. \n\n,  and  135 0\n\nThe images presented to MAGIC consist of an arrangement of features over the input \n\narray. The feature  activity is clamped on  (i.e., the feature is  present), and the initial direc(cid:173)\ntional tag of the feature is set at random. Feature unit activities and tags feed to  the hidden \nunits,  which  in  turn  feed  back to  the feature  units.  Through  a relaxation process,  the  sys(cid:173)\ntem  settles  on  an  assignment of tags  to  the  feature  units  (as  well  as  to  the  hidden  units, \nalthough read out from the model concerns only the feature units). MAGIC is a mean-field \napproximation  to  a  stochastic  network  of  directional  units  with  binary-gated  outputs \n(Zemel, Williams, &  Mozer,  1995). This means that a mean-field energy functional  can be \nwritten that expresses the network state and controls the dynamics; consequently, MAGIC \nis  guaranteed to converge to a stable pattern of tags. \n\nEach hidden unit detects a spatially local configuration offeatures, and it acts to rein(cid:173)\n\nstate  a  pattern  of tags  over  the  configuration.  By  adjusting  its  incoming  and  outgoing \nweights during training, the hidden unit is made to respond to configurations that are con(cid:173)\nsistently  tagged in  the training set.  For example,  if the  training  set contains  many  corner \njunctions where horizontal and vertical lines come to a point and if the teacher tags all fea(cid:173)\ntures  composing  these  lines  as  belonging  to  the  same  object,  then  a  hidden  unit  might \nlearn to  detect this configuration, and when  it does  so,  to force  the tags  of the component \nfeatures to be the same. \n\nIn our earlier work, MAGIC was trained to map the feature activity pattern to a target \npattern of feature  tags,  where  there was  a distinct tag for each object in  the  image.  In  the \npresent  work,  the  training  objective  is  rather  to  impose  uniform  tags  over  the  features. \nAdditionally,  the  training  objective  encourages  MAGIC  to  reinstate  the  feature  activity \n\n\f56 \n\nM  C.  Mozer \n\nIteration  1 \n\nIteration 2 \n\nIteration 4 \n\nIteration 6 \n\nIteration  II \n\n,, -\n\n,  ---~-\n\n----- -;,- ~ \n\ni \nI \n\nI \nI \n\nDirectional Tag Spectrum \n\n_\n\nAI!II.\u00b7, \n\n1\n\n1 1.'I!t~ \n\nFIGURE 3.  The state of MAGIC as  processing proceeds for an image composed  of a  pair of \nlines made out of horizontal and vertical line segments. The coloring of a  segment represents \nthe  directional  tag.  The  segments  belonging  to  a  line  are  randomly  tagged  initially;  over \nprocessing iterations, these tags are brought into alignment. \n\npattern over the feature units; that is, the hidden units must encode and propagate informa(cid:173)\ntion back to the feature units that is sufficient to specify the feature activities (if the feature \nactivities  weren't  clamped).  With  this  training  criterion,  MAGIC  becomes  a  type  of \nautoencoder.  The key  property  of MAGIC is  that it can assign  a feature  configuration the \nsame  tag  only  if it  learns  to  encode  the  configuration.  If an  arrangement is  not encoded, \nthere will  be no force  to align the feature  tags. Further, fixed  weak inhibitory connections \nbetween every pair of feature units  serve to spread the tags  apart if the force to align them \nis not strong enough. \n\nNote that this  training paradigm does not require a teacher to tag features  as  belong(cid:173)\n\ning to one part or another. MAGIC will try to tag all features as belonging to the same part, \nbut it is  able  to do  so  only for configurations of features  that it is  able  to encode.  Conse(cid:173)\nquently, highly regular and recurring configurations will be grouped together, and irregular \nconfigurations  will  be  pulled  apart.  The  strength  of grouping  will  be  proportional  to  the \ndegree of regularity. \n\n6  SIMULATION EXPERIMENTS \n\nTo illustrate the  behavior of the model, I show a simple simulation in  which MAGIC \nis  trained  on  pairs  of lines,  one  vertical  and  one  horizontal.  Each  line  is  made  up  of 6 \ncolinear line  segments. The segments  are  primitive input features  of the model.  The  two \nlines  may appear in different positions relative to one another.  Hence,  the  strongest regu(cid:173)\nlarity  is  in  the  segments  that make  up  a  line,  not  the junction  between  the  lines.  When \ntrained  with  two  hidden  units,  MAGIC  has  sufficient  resources  to  encode  the  structure \nwithin  each line,  but  not  the  relationships  among the  lines;  because  this  structure  is  not \nencoded, the features  of the two lines are not assigned the same tags (Figure 3). \n\nAlthough  each  \"part\"  is  made  up  of features  having  a  uniform orientation and  in  a \ncolinear arrangement, the composition and structure of the parts is  immaterial; MAGIC's \nperformance depends only on  the regularity of the configurations. In the next set of simu(cid:173)\nlations, MAGIC discovers regularities of a more arbitrary  nature. \n\n6.1  MODELING HUMAN LEARNING OF PART-WHOLE RELATIONS \n\nSchyns  and  Murphy  (1992)  studied  the  ontogeny  of  part-whole  relationships  by \ntraining human  subjects on a novel  class  of objects  and  then examining how the subjects \ndecomposed the objects into their parts.  I briefly  describe their experiment, followed  by  a \nsimulation that accounts for their results. \n\nIn  the  first  phase  of the  experiment,  subjects  were  shown  3-D  gray  level  \"martian \nrocks\"  on  a CRT screen.  The rocks  were  constructed by  deforming a sphere, resulting in \nvarious bumps or protrusions. Subjects watched the rocks rotating on the screen, allowing \nthem to view the rock from all  sides. Subjects were shown six instances, all  of which were \nlabeled \"M 1 rocks\" and were then tested to determine whether they could distinguish M 1 \n\n\fHierarchical Decomposition of Visual Scenes \n\n57 \n\nrocks from  other rocks.  Subjects continued training until they  performed correctly on  this \ntask. Every Ml rock was divided into octants; the protrusions on seven of the octants were \ngenerated randomly, and the protrusions on the last octant were the same for all Ml rocks. \nTwo  groups of subjects were studied. The A group saw M I rocks all  having part A, the B \ngroup saw M 1 rocks  all  having  part B.  Following training,  subjects  were asked to  delin(cid:173)\neate the parts they thought were important on various exemplars. Subjects selected the tar(cid:173)\nget part from the category on which they were trained 93% of the time, and the alternative \ntarget-the target from  the other category-only 8% of the time, indicating that the learn(cid:173)\ning task made a part dramatically more salient. \n\nTo model  this  phase of the experiment, I generated two dimensional contours of the \n\nsame flavor as  Schyns and Murphy's martian rocks (Figure 4). Each rock-can it a \"venu(cid:173)\nsian  rock\"  for  distinction-can  be  divided  into  four  quadrants  or  parts.  Two  groups  of \nvenusian rocks  were generated. Rocks  of category A an contained part A (left panel, Fig(cid:173)\nure  4),  rocks  of category  B  contained  part  B  (center panel,  Figure  4). One  network  was \ntrained  on  six  exemplars  of category  A rocks,  another network  was  trained  on  six exem(cid:173)\nplars  of category  B rocks.  Then,  with  learning  turned  off,  both  networks  were  tested  on \nfive  presentations each of twelve new exemplars, six each of categories A and B. \n\nJust as  the human subjects were instructed to delineate parts, we must ask MAGIC to \n\ndo the same. One approach would be to run the model with a test stimulus and, once it set(cid:173)\ntles, select an features having directional tags clustered tightly together as belonging to the \nsame  part.  However,  this requires  specifying and tuning  a clustering procedure.  To  avoid \nthis  additional  step,  I  simply  compared  how  tightly  clustered  were  the  tags  of the  target \npart relative  to  those  of the  alternative  target.  I  used  a directional  variance  measure  that \nyields a  value of 0 if all  tags  are identical  and  I  if the tags  are distributed  uniformly over \nthe directional  spectrum. By this  measure,  the  variance was  .30 for the target part and  .68 \nfor the alternative target (F(l, 118) = 322.0, P < .001), indicating that the grouping of fea(cid:173)\ntures of the target part was significantly stronger. This replicates, at least qualitatively, the \nfinding of Schyns and Murphy. \n\nIn  a second phase of Schyns and Murphy's experiment, subjects were trained on  cat(cid:173)\negory C rocks, which were formed by  adjoining parts A and B and generating the remain(cid:173)\ning six octants at random. Following training, subjects were again asked to delineate parts. \nAll subjects delineated A and B as distinct parts. In contrast, a naive group of subjects who \nwere trained on category C alone always grouped A and B together as  a single part. \n\nTo  model this phase, I generated six category C venusian rocks that had both parts A \nand B (right panel, Figure 4). The versions of MAGIC that had been trained on category A \nand  B rocks  alone  were  now  trained on  category C  rocks.  As  a control  condition, a  third \nversion  of MAGIC  was  trained  from  scratch  on  category  C  rocks  alone.  I compared  the \ntightness  of clustering of the combined A-B  part for the  first  two  nets  to  the  third.  Using \nthe same  variance measure as  above,  the nets  that first  received training on parts A and B \nalone yielded  a  variance  of .57,  and  the  net  that  was  only  trained  on  the  combined  A-B \npart yielded a variance of .47 (F(1,88) = 7.02, P < .02).  One cannot directly compare the \nvariance of the A-B  part to  that of the A and B parts alone, because the measure is  struc(cid:173)\ntured  such  that parts with more features  always  yield larger variances.  However,  one can \ncompare the two conditions using the relative variance of the combined A-B  part to the A \n\n'S'~7 \n\nt.r  ~ \n\n~  -,~ \n\n(-\"to.  ~~ \n~~\" \nI~~ \n,,~ \n~~\"\"-.... .... \n,~~ \n\n,  ~ \n-,-./ \n\nFIGURE 4.  Three examples of the  martian rock  stimuli  used  to  train MAGIC.  From left to \nright, the rocks are of categories A, B, and C. The lighter regions are the contours that define \nrocks of a given category. \n\n\f58 \n\nM  C.  Mozer \n\nand B parts alone.  This  yielded the same outcome as  before (.21  for the first  two nets,  .12 \nfor  the  third  net,  F(l,88) = 5.80, p  < .02).  Thus,  MAGIC  is  also  able  to  account for  the \neffects of prior learning on part ontogeny. \n\n7  CONCLUSIONS \n\nThe regularity  principle proposed in  this  work seems consistent with the homogene(cid:173)\n\nity  principle  proposed  earlier  by  Schyns  and  Murphy  (1991,  1992).  Indeed,  MAGIC  is \nable  to  model  Schyns  and  Murphy's  data  using  an  unsupervised  training  paradigm, \nalthough Schyns and Murphy framed their experiment as  a classification task. \n\nThis work is  but a start at modeling the development of part-whole hierarchies based \n\non perceptual experience. MAGIC requires further elaboration, and I am somewhat skepti(cid:173)\ncal that it is  sufficiently powerful in its present form  to be pushed much further.  The main \nissue restricting it  is  the  representation  of input features.  The  oriented-line-segment fea(cid:173)\ntures  are  certainly  too  primitive  and  inflexible  a  representation.  For  example,  MAGIC \ncould not be trained to recognize the lid and shell of Figure 2 because it encodes the orien(cid:173)\ntation of the features with respect to the image plane, not with respect to one another. Min(cid:173)\nimally, the representation requires some version of scale and rotation invariance. \n\nPerhaps  the  most interesting computational'issue raised  by  MAGIC is  how the  pat(cid:173)\n\ntern  of feature  tags  is  mapped  into  an  explicit  part-whole  decomposition.  This  involves \nclustering together the similar tags as  a unit, or possibly selecting all  tags in a given range. \nTo  do  so  requires  specification  of additional  parameters  that  are  external  to  the  model \n(e.g.,  how  tight  the  cluster  should  be,  how  broad  the  range  should  be,  around  what  tag \ndirection it should be centered). These parameters  are deeply related to  attentional issues, \nand a current direction of research is  to explore this relationship. \n\n8  ACKNOWLEDGEMENTS \nThis  research  was  supported  by  NSF PYI  award  IRI-9058450  and  grant  97-18  from  the  McDonnell-Pew  Pro(cid:173)\ngram in  Cognitive Neuroscience. \n\n9  REFERENCES \nBecker, S.  (1995).  JPMAX:  Learning to recognize  moving objects as  a model-fitting problem.  In  G.  Tesauro, D. \nS. Touretzky,  &  T.  K.  Leen  (Eds),  Advances  in  Neural  Informatio/l  ProcessinK  Systems  7  (pp.  933-940). \nCambridge, MA:  MIT Press. \n\nEckhorn, Roo  Bauer, R., Jordan, w., Brosch,  M., Kruse, w., Munk, M.,  &  Reitboek,  H.  J. (1988).  Coherent oscil(cid:173)\n\nlations: A mechanism of feature linking in  the visual cortex?  Biological Cybernetics, 60, 121-130. \n\nGray,  C.  M.,  Koenig,  P.,  Engel,  A.  K.,  &  Singer,  W.  (1989).  Oscillatory  responses  in  cat  visual  cortex  exhibit \n\nintercolumnar synchronization which reflects global stimulus properties. Nature (London), 338, 334-337. \n\nMozer,  M.  c.,  Zemel,  R.  S.,  Behrmann,  M ..  &  Williams,  C.  K.  I.  (1992).  Learning  to  segment  images  using \n\ndynamic feature binding. Neural Computation, 4, 650-666. \n\nPlaut,  D.  C. ,  Nowlan,  S.,  &  Hinton,  G.  E . (1986).  Experiments  011  leaminK  by back  propagation  (Technical \n\nreport CMU-CS-86- 126). Pittsburgh, PA:  Carnegie-Mellon University, Department of Computer Science. \n\nPollack, J. B.  (1988).  Recursive auto-associative memory:  Devising compositional distributed representations . In \nProceedings  of the  Tenth  Annual Conference  of the  Cognitive  Science  Society  (pp.  33-39).  Hillsdale,  NJ : \nErlbaum. \n\nSchyns,  P. G.,  &  Murphy, G. L.  (1992).  The ontogeny of units in  object categories.  In  Proceedings  of the  Four(cid:173)\n\nteenth Annual Conference (!fthe  COKnitil'e Science Society (pp.  197-202). Hillsdale, NJ:  Erlbaum. \n\nSchyns, P.  G. , & Murphy, G.  L. (1993). The ontogeny of transformable part representations in  object concepts. In \nProceedings of the  Fifteenth Annual Conference  of the  OIKllitive Scien ce  Society (pp.  917-922).  Hillsdale. \nNJ:  Erlbaum. \n\nSmolensky, P. (1990).  Tensor product variable binding and  the representation  of symbolic structures  in  connec(cid:173)\n\ntionist networks. Artificial Intelligence, 46.  159-2 I 6. \n\nTenenbaum, J.  B.  (1994).  Functional  parts.  In  A.  Ram &  K.  Eiselt (Eds .). Pmceedings  (If the  Sixteenth  An/lual \n\nConference of the Cognitive Science Society (pp. 864-869).  Hillsdale, NJ : Erlbaum. \n\nvon der Malsburg,  C.  (1981).  The  correlatioll  theory  of brain  fun ction  (Internal  Report  81-2).  Goettingen: \n\nDepartment of Neurobiology, Max  Planck Institute for  Biophysical Chemistry. \n\nZemel, R.  S., Williams, C.  K.  I.. & Mozer, M.  C.  (1995). Lending direction to neural networks . Neural Networks , \n\n8, 503-512. \n\n\f", "award": [], "sourceid": 1482, "authors": [{"given_name": "Michael", "family_name": "Mozer", "institution": null}]}