{"title": "Learning invariant features using the Transformed Indian Buffet Process", "book": "Advances in Neural Information Processing Systems", "page_first": 82, "page_last": 90, "abstract": "Identifying the features of objects becomes a challenge when those features can change in their appearance. We introduce the Transformed Indian Buffet Process (tIBP), and use it to define a nonparametric Bayesian model that infers features that can transform across instantiations. We show that this model can identify features that are location invariant by modeling a previous experiment on human feature learning. However, allowing features to transform adds new kinds of ambiguity: Are two parts of an object the same feature with different transformations or two unique features? What transformations can features undergo? We present two new experiments in which we explore how people resolve these questions, showing that the tIBP model demonstrates a similar sensitivity to context to that shown by human learners when determining the invariant aspects of features.", "full_text": "Learning invariant features using\n\nthe Transformed Indian Buffet Process\n\nJoseph L. Austerweil\n\nDepartment of Psychology\n\nThomas L. Grif\ufb01ths\n\nDepartment of Psychology\n\nUniversity of California, Berkeley\n\nUniversity of California, Berkeley\n\nBerkeley, CA 94720\n\nBerkeley, CA 94720\n\nJoseph.Austerweil@gmail.com\n\nTom Griffiths@berkeley.edu\n\nAbstract\n\nIdentifying the features of objects becomes a challenge when those features can\nchange in their appearance. We introduce the Transformed Indian Buffet Process\n(tIBP), and use it to de\ufb01ne a nonparametric Bayesian model that infers features\nthat can transform across instantiations. We show that this model can identify\nfeatures that are location invariant by modeling a previous experiment on human\nfeature learning. However, allowing features to transform adds new kinds of am-\nbiguity: Are two parts of an object the same feature with different transformations\nor two unique features? What transformations can features undergo? We present\ntwo new experiments in which we explore how people resolve these questions,\nshowing that the tIBP model demonstrates a similar sensitivity to context to that\nshown by human learners when determining the invariant aspects of features.\n\n1\n\nIntroduction\n\nOne way the human brain manages the massive amount of sensory information it receives is by\nlearning invariants \u2014 regularities in its input that do not change across many stimuli sharing some\nproperty of interest. Learning and using invariants is essential to many aspects of cognition and\nperception [1]. For example, the retinal image of an object1 changes with viewpoint and location, yet\npeople can still identify the object. One explanation for this capability is the visual system recognizes\nthat the features of an object can occur differently across presentations, but will be transformed\nin a few predictable ways. Representing objects in terms of invariant features poses a challenge\nfor models of feature learning. From a computational perspective, unsupervised feature learning\ninvolves recognizing regularities that can be used to compactly encode the observed stimuli [2].\nWhen features have the same appearance and location, techniques such as factorial learning [3] or\nvarious extensions of the Indian Buffet Process (IBP) [4] have been successful at learning features,\nand show some correspondence to human performance [5]. Unfortunately, invariant features do not\nalways have the same appearance or location, by de\ufb01nition. Despite this, people are able to identify\ninvariant features (e.g., [6]), meaning that new machine learning methods need to be explored to\nfully understand human behavior.\n\nWe propose an extension to the IBP called the Transformed Indian Buffet Process (tIBP), which\ninfers features that vary across objects. Analogous to how the Transformed Dirichlet Process extends\nthe Dirichlet Process [7], the tIBP associates a parameter with each instantiation of a feature that\ndetermines how the feature is transformed in the given image. This allows for unsupervised learning\nof features that are invariant in location, size, or orientation. After de\ufb01ning the generative model for\nthe tIBP and presenting a Gibbs sampling inference algorithm, we show that this model can learn\nvisual features that are location invariant by modeling previous behavioral results (from [6]).\n\n1We talk about objects, images, and scenes having features depending on the context.\n\n1\n\n\f(a)\n\n(b)\n\n+\n\nFigure 1: Ambiguous representations. (a) Does this object have one feature that contains two vertical\nbars or two features that each contain one vertical bar? (b) Are these two shapes the same? The shape\non the left is typically perceived as a square and the shape on the right is typically perceived as a\ndiamond despite being objectively equivalent after a transformation (a 45 degree rotation).\n\nOne new issue that arises from inferring invariant features is that it can be ambiguous whether parts\nof an image are the same feature with different transformations or different features. For example,\nan object containing two vertical bars has (at least) two representations: a single feature containing\ntwo vertical bars a \ufb01xed distance apart, or two features each of which is a vertical bar with its own\ntranslational transformation (see Figure 1 (a)). The tIBP suggests an answer to this question: pick\nthe smallest feature representation that can encode all observed objects. By presenting objects that\nare either the two vertical bars a \ufb01xed distance apart that vary in position or two vertical bars varying\nindependently in location, we con\ufb01rm that people use sets of objects to infer invariant features in a\nbehavioral experiment and that the different feature representations lead to different decisions.\n\nIntroducing transformational invariance also raises the question of what kinds of transformations\na feature can undergo. A classic demonstration of the dif\ufb01culty of de\ufb01ning a set of permissable\ntransformations is the Mach square/diamond [8]. Are the two shapes in Figure 1 (b) the same?\nThe shape on the right is typically perceived as a diamond while the shape on the left is seen as a\nsquare, despite being identical except for a rotational transformation. We extend the tIBP to include\nvariables that select the transformations each feature is allowed to undergo. This raises the question\nof whether people can infer the permissable transformations of a feature. We demonstrate that this is\nthe case by showing that people vary in their generalizations from a square to a diamond depending\non whether the square is shown in the context of other squares that vary in rotation. This provides an\ninteresting new explanation of the Mach square/diamond: People learn the allowed transformations\nof features for a given shape, not what transformations of features are allowed over all shapes.\n\n2 Unsupervised feature learning using nonparametric Bayesian statistics\n\nOne common approach to unsupervised learning is to explicitly de\ufb01ne the generative process that\ncreated the observed data. Latent structure can then be identi\ufb01ed by inverting this process using\nBayesian inference. Nonparametric Bayesian models can be used in this way to infer latent struc-\nture of potentially unbounded dimensionality [9]. The Indian Buffet Process (IBP) [4] is a stochastic\nprocess that can be used as a prior in nonparametric Bayesian models where each object is repre-\nsented using an unknown but potentially in\ufb01nite set of latent features.\n\n2.1 Learning features using the Indian Buffet Process\n\nThe standard treatment of feature learning using nonparametric Bayesian models factors the obser-\nvations into two latent structures: (1) a binary matrix Z that denotes which objects have each feature,\nand (2) a matrix Y that represents how the features instantiated. If there are N objects and K fea-\ntures, then Z is a N \u00d7 K binary matrix (where object n has feature k if znk = 1) and Y is a K \u00d7 D\nmatrix (where D is the dimensionality of the observed properties of each object, e.g., the number of\npixels in an image). The IBP de\ufb01nes a probability distribution over Z when K \u2192 \u221e such that only\na \ufb01nite number of the columns are non-zero (with prob. 1 for \ufb01nite N). This distribution is\n\nexp{\u2212\u03b1HN }\n\nK+\n\nY\n\nk=1\n\n(N \u2212 mk)!(mk \u2212 1)!\n\nN !\n\n(1)\n\nwhere \u03b1 is a parameter affecting the number of non-zero entries in the matrix, Kh is the number\nof features with history h (the history is the corresponding column of each feature, interpreted as\n\nP (Z) =\n\n\u03b1K+\nQ2N\n\u22121\nh=1 Kh!\n\n2\n\n\fa binary number), K+ is the number of columns with non-zero entries, HN is the N-th harmonic\nnumber, and mk is the number of objects that have feature k. Typically, a simple parametric model is\nused for Y (Gaussian for generating real-valued observations, or Bernoulli for binary observations).\nThe observed properties of objects can be summarized in a N \u00d7 D matrix X. The vector xn\nrepresenting the properties of object n is generated based on its features zn and the matrix Y. This\ncan be done using a linear-Gaussian likelihood for real-valued properties [4], or a noisy-OR for\nbinary properties [10]. All of the modeling results in this paper use the noisy-OR, with\n\np(xnd = 1|Z, Y) = 1 \u2212 (1 \u2212 \u03bb)znyd (1 \u2212 \u01eb)\n\n(2)\n\nwhere xnd is the dth observed property of the nth object, and yd is the corresponding column of Y.\n\n2.2 The Transformed Indian Buffet Process (tIBP)\n\nFollowing Sudderth et al.\u2019s [7] extension of the Dirichlet Process, the Transformed Indian Buffet\nProcess (tIBP) allows features to be transformed. The transformations are object-speci\ufb01c, so in\na sense, when an object takes a feature, the feature is transformed with respect to the object. Let\ng(Y|\u03b2) be a prior probability distribution on Y parameterized by \u03b2, \u03a6(\u03b7) be a distribution over a set\nof transformations parameterized by \u03b7, rn be a vector of transformations of the feature instantiations\nfor object n, and f (xn|rn(Y), zn, \u03b3) be the data distribution and \u03b3 be any other parameters used in\nthe data distribution. The following generative process de\ufb01nes the tIBP:\n\nZ|\u03b1\nY|\u03b2\n\n\u223c IBP(\u03b1)\n\u223c g(\u03b2)\n\nrnk|\u03b7\nxn|rn, zn, Y, \u03b3\n\niid\u223c \u03a6(\u03b7)\n\u223c f (xn|rn(Y), zn, \u03b3)\n\nIn this paper, we focus on binary images where the transformations are drawn uniformly at random\nfrom a \ufb01nite set (though Section 5.1 uses a slightly more complicated distribution). The reason for\nthis (instead of using a Dirichlet process over transformations) is that we are interested in modeling\ninvariances in translation, size, or rotation and to model images where a feature occurs in a novel\ntranslation, size, or rotation effectively, it is necessary for them to have non-zero probability. In this\nsection, we focus on translations. Assuming our data are in {0, 1}D1\u00d7D2, a translation shifts the\nstarting place of its feature in each dimension by rnk = (d1, d2). We assume a discrete uniform\nprior on shifts: rnk \u223c U {0, . . . , D1 \u2212 1} \u00d7 U {0, . . . , D2 \u2212 1}. Each transformation results in a\nnew interpretation of the feature, rn(yd). The likelihood p(xnd = 1|Z, Y, R) is then identical to\nEquation 2, substituting the vector of transformed feature interpretations rn(yd) for yd.\n\n2.3\n\nInference by Gibbs sampling\n\nWe sample from the posterior distribution on feature assignments Z, feature interpretations Y, and\ntransformations R given observed properties X using Gibbs sampling [11]. The algorithm consists\nof iteratively drawing each variable conditioned on the current values of all other variables.\nFor features with mk > 0 (after removal of the current value of znk), we draw znk by marginalizing\nover transformations. This avoids a bottleneck in sampling, as otherwise we would have to get lucky\nin drawing the right feature and transformation. The marginalization can be done directly, with\n\np(znk|Z\u2212(nk), R\u2212(nk), Y, X) = X\n\nrnk\n\np(znk|Z\u2212(nk), R, Y, X)p(rnk)\n\n(3)\n\nwhere the \ufb01rst term on the right hand side is proportional to p(xn|zn, Y, R)p(znk|Z\u2212(nk)) (pro-\nvided by the likelihood and the IBP prior respectively, with Z\u2212(nk) being all of Z except znk), and\nthe second term is uniform over all rnk. If znk = 1, we then sample rnk from\n\np(rnk|znk = 1, Z\u2212(nk), R\u2212(nk), Y, X) \u221d p(xn|zn, Y, R)p(rnk)\n\n(4)\n\nwhere the relevant probabilities are also used in computing Equation 3, and can thus be cached.\n\nWe follow Wood et al.\u2019s [10] method for drawing new features (ie. features for which currently\nmk = 0). First, we draw an auxilary variable K new\n\n, the number of \u201cnew\u201d features, from\n\nn\n\np(K new\n\nn\n\n|xn, Zn,1:(K+K new\n\nn ), Y, R) \u221d p(xn|Znew, Y, K new\n\nn\n\n)P (K new\n\nn\n\n)\n\n(5)\n\n3\n\n\fn\n\nnew columns containing ones in row n. From the IBP, we\nwhere Znew is Z augmented with K new\nknow that K new\nn \u223c Poisson(\u03b1/N ) [4]. To compute the \ufb01rst term on the right hand side, we need\nto marginalize over the possible new feature images and their transformations (Y(K+1):(K+K new\nn )\nand Rn,(K+1):(K+K new\nn )). We assume that the \ufb01rst object to take a feature takes it in its canonical\nform and thus it is not transformed. Since the \ufb01rst transformation of a feature and its interpretation\nin an image are not identi\ufb01able, this assumption is valid and necessary to aid in inference. With\nno transformations, drawing the new features in the noisy-OR tIBP model is equivalent to drawing\nthe new features in the normal noisy-OR IBP model. Thus, we can use the same sampling step for\nK new\n\nas [10]. Let Znew = Zn,1:(K+K new\n\nn\n\np(K new\n\nn\n\n| . . .) \u221d p(K new\n\nn\n\nn ). Continuing the previous equation,\n)Y\n\np(xnd|Znew, Y, R, K new\n\n)\n\nn\n\nd\n\u03b1K new\nn e\u2212\u03b1\nK new\n\nn\n\n=\n\nn (cid:17)\n! (cid:16)1 \u2212 (1 \u2212 \u01eb)(1 \u2212 \u03bb)znrn(yd)(1 \u2212 p\u03bb)K new\n\n(6)\n\n(7)\n\nwhere rn(yd) is the vector of transformed feature interpretations along observed dimension d.\nFinally, to complete each Gibbs sweep we resample the feature interpretations (Y) given the state\nof the other variables. We sample each ykd independently given the state of the other variables, with\n(8)\n\np(ykd|X, Z, R, Y\u2212(kd)) \u221d p(X|Y, Z, R)p(ykd)\n\nwhere p(X|Y, Z, R) is the likelihood, given by the noisy-OR function.\n\n2.4 Prediction\n\nTo compare the feature representations our model infers to behavioral results, we need to have judge-\nments of the model for new test objects. This is a prediction problem: computing the probability of\na new object xN +1 given the set of N observed objects X. We can express this as\n\nP (xN +1|X) = X\n\nZ,Y,R\n\nP (xN +1, Z, Y, R|X) = X\n\nZ,Y,R\n\nP (xN +1|Z, Y, R)P (Z, Y, R|X).\n\n(9)\n\nThe Gibbs sampling algorithm gives us samples from P (Z, Y, R|X) that can be used to approxi-\nmate this sum. However, a further approximation is required to compute P (xN +1|Z, Y, R). For\neach sweep of Gibbs sampling, we sample a vector of features zN +1 and corresponding transfor-\nmations rN +1 for a new object from their conditional distribution given the values of Z, Y, and\nR in that sweep, under the constraint that no new features are generated. We use these samples to\napproximate the calculation of P (xN +1|Z, Y, R) by marginalizing over zN +1 and rN +1.\n\n3 Demonstration: Learning Translation Invariant Features\n\nIn many situations learners need to form a feature representation of a set of objects, and the features\ndo not reoccur in the exact same location. A common strategy for dealing with this problem is to\npre-process data to build in the relevant invariances, or simply to tabulate the presence or absence of\nfeatures without trying to infer them from the data (e.g., [12]). The tIBP provides a way for a learner\nto discover that features are translation invariant, and to infer them directly from the data.\n\nFiser and colleagues [6, 12] showed that when two parts of an image always occur together (forming\na \u201cbase pair\u201d), people expect the two parts to occur together as if they had one feature representing\nthe pair. In Experiments 1 and 2 of [6], participants viewed 144 scenes, where each scene contained\nthree of the six base pairs in varied spatial location. Each base pair was two of twelve parts in\na particular spatial arrangment. Afterwards, participants chose which of two images was more\nfamiliar: a base pair (in a never seen before location) and pair of parts that occured together at least\nonce (but were not a base pair). Participants strongly preferred the base pair. To demonstrate the\nability of tIBP to infer translation invariant features that are made up of complex parts, we trained\nthe model on the scenes with the same structure as those shown to participants. The only difference\nwas to lower the dimensionality of the images by recoding each part to be a 3 by 3 pixel image (the\nimages from [6] were 1200 by 900 pixels). Figure 2 (a) shows the basic parts (grouped into their\nbase pairs), while 2 (b) shows one scene given to the model. Figure 2 (c) shows the features inferred\n\n4\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 2: Learning translation invariant features. (a) Each of the parts used to form base pairs, with\nbase pairs grouped in rectangles. (b) One example scene. (c) Features inferred by the tIBP model\n(one sample from the Gibbs sampler). The tIBP infers the base pairs as features.\n\nby the tIBP model (one sample from the Gibbs sampler after 1000 iterations with a 50 iteration\nburn-in), given the 144 scenes. The parameters were initialized to \u03b1 = 0.8, \u01eb = 0.05, \u03bb = 0.99, and\np = 0.4. The model reconstructs the base pairs used to generate the images, and learns that the base\npairs can occur in any location. To compare the model people\u2019s familiarity judgments, we calculated\nthe model\u2019s predictive probability for each base pair in a new location and for a part in that base pair\nwith another part that co-occured with it at least once (but not in a base pair). Over all comparisons,\nthe tIBP model gave higher probability to the image containing the base pair.\n\n4 Experiment 1: One feature or two features transformed?\n\nA new problem arises out of learning features that can transform. Is an image composed of the same\nfeature multiple times with different instantiations or is it composed with different features that may\nor may not be transformed? One way to decide between two possible feature representations for the\nobject is to pick the features that allow you to encode the object and the other objects it is associated\nwith. For example, the object from Figure 1 (a) is the \ufb01rst object (from the top left) in the two\nsets of objects shown in Figure 3. Figure 3 (a) is the unitized object set. All of the objects in this\nset can be represented as translations of one feature that is two vertical bars. Although this object\nset can also be described in terms of two features (each of which are vertical bars that can each\ntranslate independently), it is a surprising coincidence that the two vertical bars are always the same\ndistance apart over all of the objects in the set. Figure 3 (b) is the separate object set. This set is best\nrepresented in terms of two features, where each is a vertical bar.\n\nUsing different feature representations leads to different predictions about what other objects should\nbe expected to be in the set. Representing the objects with a single feature containing two vertical\nbars predicts new objects that have vertical bars where the two bars are the same distance apart (New\nUnitized). These objects are also expected under the feature representation that is two features that\nare each vertical bars; however, any object with two vertical bars is expected (New Separate) \u2014 not\njust those with a particular distance apart. Thus, interpreting objects with different feature repre-\nsentations has consequences for how to generalize set membership. In the following experiment,\nwe test these predictions by asking people after viewing either the unitized or separate object sets\nto judge how likely the New Unitized or New Separate objects are to be part of the object set they\nviewed. We then compare the behavioral results to the features inferred by the tIBP model and the\npredictive probability of each of the test objects given each of the object sets.\n\n(a)\n\n(b)\n\nFigure 3: Training sets for Experiment 1. (a) Objects made from spatial translations of the unitized\nfeature. (b) Objects made from spatial translations of two separate features. The number of times\neach vertical bar is present is the same in the two object sets.\n\n5\n\n\f(a)\ng\nn\ni\nt\na\nR\n \nn\na\nm\nu\nH\n\n6\n\n4\n\n2\n\n0\n\n \n\n(b)\n\nn\no\ni\nt\na\nv\ni\nt\nc\nA\n\n \nl\ne\nd\no\nM\n\n \n\nHuman Experiment 1 Results\n\nUnitized (Unit)\nSeparate (Sep)\n\nSeen both Seen Unit Seen Sep New Unit New Sep\n\n1 Bar Unit + 1 Bar 3 Sep Bars Diag\n\nTest Image\n\nModel Predictions for Experiment 1\n\nUnitized (Unit)\nSeparate (Sep)\n\nSeen Both Seen Unit Seen Sep New Unit New Sep\n\n1 Bar Unit + 1 Bar 3 Sep Bars Diag\n\nTest Image\n\n \n\n \n\nFigure 4: Results of Experiment 1. (a) Human judgments. The unitized group only rated those\nimages with two vertical bars close together highly. The separate group rate any image with two\nvertical bars highly. (b) The predictions by the tIBP model.\n\n4.1 Methods\n\nA total of 40 participants were recruited online and compensated a small amount. Three participants\nwere removed for failing to complete the task leaving 19 and 18 participants in the separate and\nunitized conditions respectively. There were two phases to the experiment: training and test. In the\ntraining phase, participants read this cover story (adapted from [13]): \u201cRecently a Mars rover found\na cave with a collection of different images on its walls. A team of scientists believes the images\ncould have been left by an alien civilization. The scientists are hoping to understand the images so\nthey can \ufb01nd out about the civilization.\u201d They then looked through the eight images (which were\neither the unitized or separate object set in a random order) and scrolled down to the next section\nonce they were ready for the test phase. Once they scrolled down to the next section, they were\ninformed that there were many more images on the cave wall that the rover had not yet had a chance\nto record. Their task for the test phase was to rate how likely on a scale from 0 to 6 they believed\nthe rover would see each image as it explored further through the cave. There were nine test images\npresented in a random order: Seen Both (an image in both training sets), Seen Unit (an image that\nonly the unitized group saw), Seen Sep (an image only the separate group saw), New Unit (an image\nvalid under the unitized feature set), New Sep (a image valid under separate feature set), and four\nother images that acted as controls (the images are under the horizontal axes of Figure 4).\n\n4.2 Results\n\nFigure 4 (a) shows the average ratings made by participants in each group for the nine test images.\nOver the nine test images, the separate group rated the Seen Sep (t(35) = 6.40, p < 0.001) and New\nSep (t(35) = 5.43, p < 0.001) objects higher than the unitized group, but otherwise did not rate any\nof the other test images signi\ufb01cantly different. As predicted by the above analysis, the unitized group\nbelieved the Mars rover was likely to encounter the two images it observed and the New Unit image\n(the unitized feature in a new horizontal position), but did not think it would encounter the other\nobjects. The separate group rated any image with two vertical bars highly. This indicates that they\nrepresent the images using two features each containing a single vertical bar varying in horizontal\nposition. Thus, each group of participants infer a set of features invariant over the set of observed\nobjects (taking into account the different horizontal position of the features in each object).\n\nFigure 4 (b) shows the predictions made by the tIBP model when given each object set. The pre-\ndictive probabilities for the test objects were calculated using the procedure outlined above (with\nthe parameter values from Section 3), using 1000 iterations of Gibbs sampling and a 50 iteration\nburn-in. A non-linear monotonic transformation of these probabilities was used for visualization,\n\n6\n\n\f(a)\n\n(b)\n\n(c)\n\nRotation set\n\nSize set\n\nNew Rotation New Size\n\nFigure 5: Stimuli for investigating how different types of invariances are learned for different object\nclasses. (a) The rotation training set. (b) The size training set. (c) Two new objects for testing the\ninferred type of invariance a New Rotation and a New Size object.\n\nraising the unnormalized probabilities to the power of 0.05 and renormalizing. The Spearman\u2019s rank\norder correlation between the model\u2019s predictions and human judgments is 0.85. Qualitatively, the\nmodel\u2019s predictions are good; however, it incorrectly predicts that the separate condition should rate\nthe 1 Bar test image highly. Unlike the participants in the separate condition, the model does not\ninfer that each object has two features and so having only one feature is not a good object. This sug-\ngests that while learning the feature representation for a set of objects, people also learn the number\nof features each object typically has. Investigating how people infer expectations about the number\nof features objects have is an interesting phenomenon that demands further study.\n\n5 Experiment 2: Learning the type of invariance\n\nA natural next step for improving the tIBP would be to make the set of transformations \u03a6 larger and\nthus extend the number of possible invariants that can be learned. Although this may be appropriate\nfrom a machine learning perspective, it is inappropriate for understanding human cognition. Re-\ncall the Mach square/diamond example in Figure 1 (b). Many shapes are equivalent when rotated;\nhowever, rotational invariance does not hold for all shapes. This example teaches a counterintuitive\nmoral: The best approach is not to include as many transformations as possible into the model.\n\nThough rotations are not valid transformations for what people commonly consider to be squares,\nthey are appropriate for many objects. This suggests that people infer the set of allowable transfor-\nmations for different classes of objects. Given the three objects in Figure 5 (a) (the rotation set) it\nseems clear that the New Rotation object in Figure 5 (c) belongs in the set, but not the New Size\nobject. The reverse holds for the three objects from the left of Figure 5 (b), the size set. To explore\nthis phenomenon, we \ufb01rst extend the tIBP to infer the appropriate set of transformations by intro-\nducing latent variables for each feature that indicate which transformations it is allowed to use. We\ndemonstrate this extension to the tIBP predicts the New Rotation object when given the rotation set\nand predicts the New Size object when given the size set \u2014 effectively learning the appropriate type\nof invariance for a given object class. Finally, we con\ufb01rm our introspective argument that people\ninfer the type of invariance appropriate to the observed class of objects.\n\n5.1 Learning invariance type using the tIBP\n\nIt is straightforward to modify the tIBP such that the type of transformations allowed on a feature is\ninferred as well. This is done by introducing a hidden variable for each feature that indicate the type\nof transformation allowed for that feature. Then, the feature transformation is generated conditioned\non this hidden variable from a probability distribution speci\ufb01c to the transformation type.\n\nThe experiment in this section is learning whether or not the feature de\ufb01ning a set of objects is\neither rotation or size invariant. Formally, we model this using a generative process that is the\nsame as the tIBP, but introduces the latent variable tk which determines the type of transformation\nallowed by feature k. If tk = 1, then rotational transformations are drawn from \u03a6\u03c1 (which is the\ndiscrete uniform distribution distribution ranging in multiples of \ufb01fteen degrees from zero to 45).\nIf tk = 0, then size transformations are drawn from \u03a6\u03c3 (which is the discrete uniform distribution\nover [3/8, 3/7, 3/5, 5/7, 1, 7/5, 11/7, 5/3, 11/5, 7/3, 11/3]). We assume tk\nThe inference algorithm for this extension is the same as for the tIBP except we need to infer the\nvalues of tk. We draw tk using a Gibbs sampling scheme while marginalizing over r1k, . . . , rnk,\n\niid\u223c Bernoulli(\u03c0).\n\np(tk|X, Y, Z, R\u2212k, t\u2212k) \u221d X\n\nrnk\n\np(xn|rnk, tk, Y, Z, R\u2212k, t\u2212k)p(rk|tk)p(tk).\n\n(10)\n\n7\n\n\f(a)\n\ng\nn\ni\nt\na\nR\n \nn\na\nm\nu\nH\n\n6\n\n4\n\n2\n\n0\n\nHuman Responses to Experiment 2\n\nSeen BothSeen Rot Seen Size New Rot New Size\n\n(b)\n\nn\no\ni\nt\na\nv\ni\nt\nc\nA\n\n \nl\ne\nd\no\nM\n\n \n\nModel Predictions for Experiment 2\n\n \n\nRotation (Rot)\n\nSize\n\nSeen Both Seen Rot Seen Size New Rot New Size\n\nTest Image\n\nTest Image\n\nFigure 6: Results of Experiment 2. (a) Responses of human participants. (b) Model predictions.\n\nPrediction is as above except tk gives the set of transformations each feature is allowed to take.\n\n5.2 Methods\n\nA total of 40 participants were recruited online and compensated a small amount, with 20 partici-\npants in both training conditions (rotation and size). The cover story from Experiment 1 was used.\nParticipants observed the three objects in their training set and then generalize on a scale from 0 to\n6 to \ufb01ve test objects: Same Both (the object that is in both training sets), Same Rot (the last object of\nthe rotation set), Same Size (the last object of the size set), New Rot and New Size.\n\n5.3 Results\n\nFigure 6 (a) shows the average human judgments. As expected, participants in the rotation condition\ngeneralize more to the New Rot object than the size condition (unpaired t(38) = 4.44, p < 0.001)\nand vice versa for the New Size object (unpaired t(38) = 5.34, p < 0.001). This con\ufb01rms our hy-\npothesis; people infer the appropriate set of transformations (a subset of all transformations) features\nare allowed to use for a class of objects. Figure 6 (b) shows the model predictions with parameters\nset to \u03b1 = 2, \u01eb = 0.01, \u03bb = 0.99, p = 0.5, and \u03c0 = 0.5 and using the same visualizing technique\nas Experiment 1 (with T = 0.005), run for 1000 iterations (with a burn-in of 50 iterations) on the\nsets of images (downsampled to 38 by 38 pixels). Qualitatively, the extended tIBP model has nearly\nthe same pattern of results as the participants in the experiment. The only issue being that it gives\nhigh probability to the Same Size when given the rotation set, an artifact from downsampling. The\nSpearman\u2019s rank order correlation between the model\u2019s predictions and human judgments is 0.68.\nImportantly, the model predicts that only when given the rotation set should participants generalize\nto the New Rot object and only when given the size set should they generalize to the New Size object.\n\n6 Conclusions and Future Directions\n\nIn this paper, we presented a solution to how people infer feature representations that are invariant\nover transformations and in two behavioral experiments con\ufb01rmed two predictions of a new model\nof human unsupervised feature learning.\nIn addition to these contributions, we proposed a \ufb01rst\nsketch of a new computational theory of shape representation \u2014 the features representing an object\nare transformed relative to the object and the set of transformations a feature is allowed to undergo\ndepends on the object\u2019s context. In the future, we would like to pursue this theory further, expanding\nthe account of learning the types of transformations and exploring how the transformations between\nfeatures in an object interact (we should expect some interaction due to real world constraints on\nthe transformations, e.g., prospective geometry). Finally, we hope to include other facets of visual\nperception into our model, like a perceptually realistic prior on feature instantiations and features\nrelations (e.g., the horizontal bar is always ON TOP OF the vertical bar).\nAcknowledgements We thank Karen Schloss, Stephen Palmer, and the Computational Cognitive Science Lab\nat Berkeley for discussions and AFOSR grant FA-9550-10-1-0232, and NSF grant IIS-0845410 for support.\n\n8\n\n\fReferences\n\n[1] S. E. Palmer. Vision Science. MIT Press, Cambridge, MA, 1999.\n[2] H. Barlow. Unsupervised learning. Neural Computation, 1:295\u2013311, 1989.\n[3] Z. Ghahramani. Factorial learning and the EM algorithm. In Advances in Neural Information\n\nProcessing Systems, volume 7, pages 617\u2013624, Cambridge, MA, 1995. MIT Press.\n\n[4] T. L. Grif\ufb01ths and Z. Ghahramani. In\ufb01nite latent feature models and the Indian buffet process.\n\nTechnical Report 2005-001, Gatsby Computational Neuroscience Unit, 2005.\n\n[5] J. L. Austerweil and T. L. Grif\ufb01ths. Analyzing human feature learning as nonparametric\nBayesian inference. In Daphne Koller, Yoshua Bengio, Dale Schuurmans, and L\u00b4eon Bottou,\neditors, Advances in Neural Information Processing Systems, volume 21, Cambridge, MA,\n2009. MIT Press.\n\n[6] J. Fiser and R. N. Aslin. Unsupervised statistical learning of higher-order spatial structures\n\nfrom visual scenes. Psychological Science, 12(6), 2001.\n\n[7] E. Sudderth, A. Torralba, W. Freeman, and A. Willsky. Describing visual scenes using trans-\nformed Dirichlet processes. In Advances in Neural Information Processing Systems 18, Cam-\nbridge, MA, 2006. MIT Press.\n\n[8] E. Mach. The analysis of sensations. Open Court, Chicago, 1914/1959.\n[9] M. I. Jordan. Bayesian nonparametric learning: Expressive priors for intelligent systems. In\n\nHeuristics, Probability and Causality: A Tribute to Judea Pearl. College Publications, 2010.\n\n[10] F. Wood, T. L. Grif\ufb01ths, and Z. Ghahramani. A non-parametric Bayesian method for inferring\nhidden causes. In Proceeding of the 22nd Conference on Uncertainty in Arti\ufb01cial Intelligence,\n2006.\n\n[11] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restora-\ntion of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721\u2013741,\n1984.\n\n[12] G. Orban, J. Fiser, R. N. Aslin, and M. Lengyel. Bayesian learning of visual chunks by human\n\nobservers. Proceedings of the National Academy of Sciences, 105(7):2745\u20132750, 2008.\n\n[13] J. L. Austerweil and T. L. Grif\ufb01ths. The effect of distributional information on feature learning.\nIn Proceedings of the Thirty-First Annual Conference of the Cognitive Science Society. 2009.\n\n9\n\n\f", "award": [], "sourceid": 437, "authors": [{"given_name": "Joseph", "family_name": "Austerweil", "institution": null}, {"given_name": "Thomas", "family_name": "Griffiths", "institution": null}]}