Michael C. Mozer
Structure in a visual scene can be described at many levels of granular(cid:173) ity. At a coarse level, the scene is composed of objects; at a finer level, each object is made up of parts, and the parts of subparts. In this work, I propose a simple principle by which such hierarchical structure can be extracted from visual scenes: Regularity in the relations among different parts of an object is weaker than in the internal structure of a part. This principle can be applied recursively to define part-whole relationships among elements in a scene. The principle does not make use of object models, categories, or other sorts of higher-level knowledge; rather, part-whole relationships can be established based on the statistics of a set of sample visual scenes. I illustrate with a model that performs unsu(cid:173) pervised decomposition of simple scenes. The model can account for the results from a human learning experiment on the ontogeny of part(cid:173) whole relationships.