{"title": "A Bilinear Model for Sparse Coding", "book": "Advances in Neural Information Processing Systems", "page_first": 1311, "page_last": 1318, "abstract": "", "full_text": "A Bilinear Model for Sparse Coding\n\nDavid B. Grimes and Rajesh P. N. Rao\n\nDepartment of Computer Science and Engineering\n\nUniversity of Washington\n\nSeattle, WA 98195-2350, U.S.A.\n\n grimes,rao\n\n@cs.washington.edu\n\nAbstract\n\nRecent algorithms for sparse coding and independent component analy-\nsis (ICA) have demonstrated how localized features can be learned from\nnatural images. However, these approaches do not take image transfor-\nmations into account. As a result, they produce image codes that are\nredundant because the same feature is learned at multiple locations. We\ndescribe an algorithm for sparse coding based on a bilinear generative\nmodel of images. By explicitly modeling the interaction between im-\nage features and their transformations, the bilinear approach helps reduce\nredundancy in the image code and provides a basis for transformation-\ninvariant vision. We present results demonstrating bilinear sparse coding\nof natural images. We also explore an extension of the model that can\ncapture spatial relationships between the independent features of an ob-\nject, thereby providing a new framework for parts-based object recogni-\ntion.\n\n1 Introduction\n\nAlgorithms for redundancy reduction and ef\ufb01cient coding have been the subject of con-\nsiderable attention in recent years [6, 3, 4, 7, 9, 5, 11]. Although the basic ideas can be\ntraced to the early work of Attneave [1] and Barlow [2], recent techniques such as indepen-\ndent component analysis (ICA) and sparse coding have helped formalize these ideas and\nhave demonstrated the feasibility of ef\ufb01cient coding through redundancy reduction. These\ntechniques produce an ef\ufb01cient code by attempting to minimize the dependencies between\nelements of the code by using appropriate constraints.\n\nOne of the most successful applications of ICA and sparse coding has been in the area of\nimage coding. Olshausen and Field showed that sparse coding of natural images produces\nlocalized, oriented basis \ufb01lters that resemble the receptive \ufb01elds of simple cells in primary\nvisual cortex [6, 7]. Bell and Sejnowski obtained similar results using their algorithm\nfor ICA [3]. However, these approaches do not take image transformations into account.\nAs a result, the same oriented feature is often learned at different locations, yielding a\nredundant code. Moreover, the presence of the same feature at multiple locations prevents\nmore complex features from being learned and leads to a combinatorial explosion when\none attempts to scale the approach to large image patches or hierarchical networks.\n\nIn this paper, we propose an approach to sparse coding that explicitly models the interac-\n\n\u0001\n\ftion between image features and their transformations. A bilinear generative model is used\nto learn both the independent features in an image as well as their transformations. Our\napproach extends Tenenbaum and Freeman\u2019s work on bilinear models for learning con-\ntent and style [12] by casting the problem within probabilistic sparse coding framework.\nThus, whereas prior work on bilinear models used global decomposition methods such as\nSVD, the approach presented here emphasizes the extraction of local features by remov-\ning higher-order redundancies through sparseness constraints. We show that for natural\nimages, this approach produces localized, oriented \ufb01lters that can be translated by differ-\nent amounts to account for image features at arbitrary locations. Our results demonstrate\nhow an image can be factored into a set of basic local features and their transformations,\nproviding a basis for transformation-invariant vision. We conclude by discussing how the\napproach can be extended to allow parts-based object recognition, wherein an object is\nmodeled as a collection of local features (or \u201cparts\u201d) and their relative transformations.\n\n2 Bilinear Generative Models\n\nis its scalar coef\ufb01cient. Given the linear generative model above, the goal of ICA is\n\nThe linear generative model in Equation 1 can be extended to the bilinear case by using\n\nWe begin by considering the standard linear generative model used in algorithms for ICA\nand sparse coding [3, 7, 9]:\n\n\u0005\u000b\n\f\u0005\n\u0002\u0001\n\u0005 is a\r -dimensional basis vector\n\u0006\b\u0007\b\t\nand\n\f\u0005\nwhere\nis a\r -dimensional input vector (e.g. an image),\t\n\u0005 are as independent as possible, while the goal\n\u0005 such that the\n\nin sparse coding is to make the distribution of\n\n\u0005 highly kurtotic given Equation 1.\nto learn the basis vectors\t\n\u0005 and\u000e\n\u0005 (or equivalently, two vectors\u000f and\u0010 ) [12]:\ntwo independent sets of coef\ufb01cients\n\n\u0002\u0001\u0012\u0011\u0014\u0013\n\u000f\u0016\u0015\u0017\u0010\u0019\u0018\n\u0005 and\u000e\nThe coef\ufb01cients\n\n\u0006\u001a\u0007\n\u0006\u001a\u0007\nvector . For the present study, the coef\ufb01cient\n\u001d\u0005 can be regarded as encoding the presence\njointly modulate a set of basis vectors\t\n\u001c values determine the transformation present in\nin the image while the\u000e\nof object feature\u001e\nthe image. In the terminology of Tenenbaum and Freeman [12],\u000f describes the \u201ccontent\u201d\nof the image while\u0010 encodes its \u201cstyle.\u201d\nfor a \ufb01xed\u0010\nEquation 2 can also be expressed as a linear equation in\u000f\n\u001f\u0001\u0012\u0011\u0019\u0013\n\u001c&%\n\u000f\b\u0018! \n\u0006\u001a\u0007\n\u0006\u001a\u0007\u001d\t)(\n, one obtains a linear equation in\u000f\nLikewise, for a \ufb01xed\u000f\nby varying both\u000f and\u0010\n\n. Indeed this is the de\ufb01nition\nof bilinear: given one \ufb01xed factor, the model is linear with respect to the other factor. The\npower of bilinear models stems from the rich non-linear interactions that can be represented\n\n(1)\n\n(2)\n\n(3)\n\nsimultaneously.\n\n3 Learning Sparse Bilinear Models\n\nto produce an input\n\n:\n\n\u0006\u001a\u0007$\t\n\n3.1 Learning Bilinear Models\n\nOur goal is to learn from image data an appropriate set of basis vectors\t\ndescribe the interactions between the feature vector\u000f\n\nand the transformation vector\u0010\n\nthat effectively\n.\n\n\u0003\n\u0004\n\u0005\n\u0001\n\u0003\n\u0004\n\u0005\n\u001b\n\u0004\n\u001c\n\t\n\u0005\n\u001c\n\n\u0005\n\u000e\n\u001c\n\u001c\n\u0005\n\u001c\n\u0001\n\u0003\n\u0004\n\u0005\n\"\n#\n\u001b\n\u0004\n\u001c\n\u0005\n\u001c\n\u000e\n'\n\n\u0005\n\u0001\n\u0003\n\u0004\n\u0005\n\u0005\n\n\u0005\n\u0005\n\u001c\n\fA commonly used approach in unsupervised learning is to minimize the sum of squared\npixel-wise errors over all images:\n\n\u0001\u0002\u0001\n\n\u0010\u001a\u0018\n\u0015\u0017\u000f\nwhere \u0001\t\u0001\u000b\n\f\u0001\u0002\u0001 denotes the \r\na function is to use gradient descent and alternate between minimization with respect to\n\u000f\u0016\u0015\nstated is underconstrained. The function \nsimulations indicate that convergence is dif\ufb01cult in many cases. There are many different\nways to represent an image, making it dif\ufb01cult for the method to converge to a basis set\nthat can generalize effectively.\n\n\u0006\u001a\u0007\n\u0006\u001a\u0007\n\u001c . Unfortunately, the optimization problem as\n\u0007 has many local minima and results from our\n\n\u0004\u0003\n\u0006\u001a\u0007\n\u0013\u000b\u0004\u0003\n\u0006\u001a\u0007\nand minimization with respect to\t\n\n\u0005 norm of a vector. A standard approach to minimizing such\n\n\u0013\u000b\b\u0003\n\n\u0006\u001a\u0007\n\n\u0006\u001a\u0007\n\n\u0001\u0002\u0001\n\n\u0018\u0007\u0006\n\n(4)\n\n(5)\n\nclass\u000f\n\ndecomposition (SVD) of a matrix \u000e\n\nA related approach is presented by Tenenbaum and Freeman [12]. Rather than using gradi-\nent descent, their method estimates the parameters directly by computing the singular value\ncontaining input data corresponding to each content\n. Their approach can be regarded as an extension of methods based\non principal component analysis (PCA) applied to the bilinear case. The SVD approach\navoids the dif\ufb01culties of convergence that plague the gradient descent method and is much\nfaster in practice. Unfortunately, the learned features tend to be global and non-localized\nsimilar to those obtained from PCA-based methods based on second-order statistics. As a\nresult, the method is unsuitable for the problem of learning local features of objects and\ntheir transformations.\n\nin every style\u0010\n\n. In particular, we could cast the problem within a probabilistic framework and im-\n\ncertain desirable properties. We focus here on the class of sparse prior distributions for sev-\neral reasons: (a) by forcing most of the coef\ufb01cients to be zero for any given input, sparse\npriors minimize redundancy and encourage statistical independence between the various\n[7], (b) there is growing evidence for sparse representations\nin the brain \u2013 the distribution of neural responses in visual cortical areas is highly kurtotic\ni.e. the cell exhibits little activity for most inputs but responds vigorously for a few inputs,\ncausing a distribution with a high peak near zero and long tails, (c) previous approaches\nbased on sparseness constraints have obtained encouraging results [7], and (d) enforcing\n\nThe underconstrained nature of the problem can be remedied by imposing constraints on\u000f\nand\u0010\npose speci\ufb01c prior distributions on\u000f and\u0010 with higher probabilities for values that achieve\n\u0005 and between the various\u000e\nsparseness on the\n\n\u0005 encourages the parts and local features shared across objects to be\n\u001c allows object transformations to be explained\nlearned while imposing sparseness on the\u000e\n\u0005 and\u000e\nWe assume the following priors for\n\n\u001c :\n\nin terms of a small set of basic transformations.\n\n3.2 Bilinear Sparse Coding\n\n(6)\n\n\u0011\u0013\u0012\u0015\u0014\u0017\u0016\n\u0011\"!\nare normalization constants, $\n\n\u0012\u0019\u0018\u001b\u001a\u0002\u001c\u001e\u001d \u001f\n!\u0019\u0018\u001b\u001a\nand %\n\nis a \u201csparseness function.\u201d For this study, we used &\n\nare parameters that control the\n\nand \u0011\n\nwhere \u0011\ndegree of sparseness, and &\n(\t)+*\n\u0010\u0013,\n\n\u0018 .\n\n(7)\n\n\u0013 '\n\n\n\u0007\n\u0013\n\t\n\u0005\n\u001c\n\u0015\n\u0001\n\u0003\n\u0004\n\u0005\n\u001b\n\u0004\n\u001c\n\t\n\u0005\n\u001c\n\n\u0005\n\u000e\n\u001c\n\u0005\n\u0001\n\u0003\n\u0004\n\u0005\n\u001b\n\u0004\n\u001c\n\t\n\u0005\n\u001c\n\n\u0005\n\u000e\n\u001c\n\u0003\n\u0004\n\u0005\n\u001b\n\u0004\n\u001c\n\t\n\u0005\n\u001c\n\n\u0005\n\u000e\n\u001c\n\u0018\n\n\u0010\n\u0001\n\u0005\n\n\u001c\n\u000f\n\u0013\n\n\u0005\n\u0018\n\u0001\n\u0010\n\u000f\n\u0013\n\u000e\n\u001c\n\u0018\n\u0001\n\u0010\n\u0014\n\u0016\n(\n#\n\u001f\n\u0012\n!\n\u0018\n\u0001\n\u0013\n'\n\u0005\n\f\u0001\t\u0001\n\n\b\u0003\n\n\u0001\t\u0001\n\n(\t)+*\n\n\u001c :\n\nthat maximize \n\n\u0015!\u000f\n\u000f\u0016\u0015\u0017\u0010\u0019\u0018\n\nminimizing the following optimization function over all input images:\n\n\u0007 summed over all images\n\u0018 and \u000f\n\u0018 can be used\n\u0018 .\n\n, or equivalently, minimize the negative\n. Under certain reasonable assumptions (discussed in [7]), this is equivalent to\n\n\u0015\u0017\u0010\u0014\u0018 (see, for example, [7]). The priors \u000f\n\u0006\u001a\u0007\n\nWithin a probabilistic framework, the squared error function \ncan be interpreted as representing the negative log likelihood of the data given the parame-\nters: \u0003\nto marginalize this likelihood to obtain the new likelihood function: \r\nlog of \n\nThe goal then is to \ufb01nd the\t\n\u0006\u001a\u0007\n\u0006\u001a\u0007\n\u0006\u001a\u0007\nGradient descent can be used to derive update rules for the components\n\n\u0001 and\u000e\u0003\u0002 of the\nrespectively for any image , assuming a \ufb01xed\nfeature vector\u000f and transformation vector\u0010\nbasis\t\n\u0004\u0003\n\u0006\u001a\u0007\n\u0006\u001a\u0007\n\u0006\u0014\u0007\n\u0006\u001a\u0007\n\u0006\u001a\u0007$\t\n\u0006\u001a\u0007\nGiven a training set of inputs\u0003\u000e , the values for\u000f and\u0010\ncan be used to update the basis set\t\n\r\u000e\n\u0006\u001a\u0007\n\u0005 and\u000e\nof the\n\n\u001c were maintained at a \ufb01xed desired level.\n\nwithout bound, we adapted the \r\n4 Results\n\n\u0005 norm of each basis vector in such a way that the variances\n\nAs suggested by Olshausen and Field [7], in order to keep the basis vectors from growing\n\nfor each image after convergence\n\n(8)\n\n(9)\n\n(10)\n\n(11)\n\n\u0001\t\b\n\n\u0006\u001a\u0007\n\n\u0004\u0003\u0005\n\u0004\u0003\u0005\n\n\u0004\u0003\u0005\n\n&\u000b\n\n&\f\n\n\u000e\r\u0002\n\nin batch mode according to:\n\n\u0006\u001a\u0007$\t\n\n4.1 Training Paradigm\n\nWe tested the algorithms for bilinear sparse coding on natural image data. The natural\nimages we used are distributed by Olshausen and Field [7], along with the code for their\npatches randomly extracted from\n\nIn\norder to assist convergence all learning occurs in batch mode, where the batch consisted\nfor gradient descent using Equation 11 was\n\nalgorithm. The training set of images consisted of \u0010\u0011\u0010\u0013\u0012\n\u0006 source images. The images are pre-whitened to equalize large variances in\nten \u0015\nfrequency, and thus speed convergence. We choose to use a complete basis where \u0016\n\u0010\u0011\u0010\r\u0010 and we let \u0017 be at least as large as the number of transformations (including the no-\n\u0006 and \u0010\n\u0015 .\ntransformation case). The sparseness parameters $\nof \u001a\n\u0010\u001b\u0010\u001c\u0010\n\u0003 \u001f\"!#\u001f\u001c$\nset to \u0010\n\u0015 . The transformations were chosen to be 2D translations in the range \u001e\npixels in both the axes. The style/content separation was enforced by learning a single\u000f\nvector to describe an image patch regardless its translation, and likewise a single\u0010 vector\n\n\u0010\u0014\u0010\nand % were set to \u0006\u0019\u0018\n\nimage patches. The step size \u001d\n\nto describe a particular style given any image patch content.\n\n4.2 Bilinear Sparse Coding of Natural Images\n\nFigure 1 shows the results of training on natural image data. A comparison between the\nlearned features for the linear generative model (Equation 1) and the bilinear model is\n\n\u000f\n\u0013\n\n\u0001\n\t\n\u0005\n\u001c\n\u0013\n\n\u0005\n\u0013\n\u000e\n\u001c\n\u0013\n\t\n\u0005\n\u001c\n\u0018\n\u0001\n\u000f\n\u0013\n\n\u0001\n\t\n\u0005\n\u001c\n\u0005\n\u001c\n\n\u0013\n\t\n\u0005\n\u001c\n\u0015\n\u0001\n\u0003\n\u0004\n\u0005\n\u001b\n\u0004\n\u001c\n\t\n\u0005\n\u001c\n\n\u0005\n\u000e\n\u001c\n\u0005\n,\n$\n\u0003\n\u0004\n\u0005\n&\n\u0013\n\n\u0005\n\u0018\n,\n%\n\u001b\n\u0004\n\u001c\n&\n\u0013\n\u000e\n\u001c\n\u0018\n\u0005\n\u0004\n\n\u0001\n\u0001\n\u0003\n\u0010\n\u0006\n\u0007\n\n\u0007\n\n\u0001\n\u0001\n\u001b\n\u0004\n\b\n\t\n\u0006\n\u0013\n\u0003\n\u0004\n\u0005\n\u001b\n\u0004\n\u001c\n\t\n\u0005\n\u001c\n\n\u0005\n\u000e\n\u001c\n\u0018\n\u000e\n\b\n,\n$\n\u0006\n\u0013\n\n\u0001\n\u0018\n\u0004\n\u000e\n\u0002\n\u0001\n\u0003\n\u0010\n\u0006\n\u0007\n\n\u0007\n\u000e\n\u0002\n\u0001\n\u0003\n\u0004\n\b\n\t\n\u0006\n\b\n\u0002\n\u0013\n\n\u0003\n\u0003\n\u0004\n\u0005\n\u001b\n\u0004\n\u001c\n\u0005\n\u001c\n\n\u0005\n\u000e\n\u001c\n\u0018\n\n\b\n,\n%\n\u0006\n\u0013\n\u0018\n\u0005\n\u001c\n\u0004\n\t\n\u0001\n\u0002\n\u0001\n\u0003\n\u0010\n\u0006\n\u0007\n\n\u0007\n\t\n\u0001\n\u0002\n\u0001\n\u000f\n\u0004\n\u000e\n\u0013\n\u0003\n\u0003\n\u0004\n\u0005\n\u001b\n\u0004\n\u001c\n\u0005\n\u001c\n\n\u0005\n\u000e\n\u001c\n\u0018\n\n\u0001\n\u000e\n\u0002\n\u0010\n\u0006\n\u0012\n\u0015\n\u0010\n\u0001\n\u0018\n\u0001\n\u0018\n\u0010\n\f(a)\n\nExample of\nlinear basis\n\nExample of\nBilinear basis\n\nwi\n\nwy((cid:0)3)\n\ni\n\nwy((cid:0)2)\n\ni\n\nwy((cid:0)1)\n\ni\n\nwy(0)\n\ni\n\nwy(1)\n\ni\n\nwy(2)\n\ni\n\nwy(3)\n\ni\n\ni = 1\n\ni = 2\n\n(b)\n\nEstimated\n\nfeature\nvector\n\nx\ni =\n\n3\n\n9\n\nCanonical\n\npatch\n\nTranslated\n\npatch\n\ny\n\ny\nj =\n\nwi j after learning\n7\n\n2\n\n1\n\ny !\nj\n\n8\n\ni\n\n3\n\n9\n\n91\n\nx\n#\n\n91\n\nEstimated\n\ntransformation\n\nvectors\n\n1\n\n2\n\n7 8\n\nFigure 1: Representing natural images and their transformations with a sparse bilin-\near model. (a) A comparison of learned features between a standard linear model and a\nbilinear model, both trained with the same sparseness priors. The two rows for the bilinear\n\n(see Equation 3) for translations of \u0003\u0001\n\npixels. (b) The representation of an example natural image patch, and of the same patch\n\ning only three signi\ufb01cant coef\ufb01cients. The code for the style vectors for both the canonical\n\ncase depict the translated object features w(\ntranslated to the left. Note that the bar plot representing the\u000f vector is indeed sparse, hav-\ndimensions which have non-zero coef\ufb01cients for\n\n\u0005 or\u000e\n\u001c basis images are shown for those\n\u001c .\npatch, and the translated one is likewise sparse. The\t\n\nprovided in Figure 1 (a). Although both show simple, localized, and oriented features,\nthe bilinear method is able to model the same features under different transformations. In\n\n\u0018\u0014\u0018\u0014\u0018\n\nmodel. Figure 1 (b) provides an example of how the bilinear sparse coding model encodes\n\n$ horizontal translations were used in the training of the bilinear\n\nFigure 2 shows how the model can account for a given localized feature at different loca-\ntions by varying the y vector. As shown in the last column of the \ufb01gure, the translated local\n\nthis case, the range \u001e\na natural image patch and the same patch after it has been translated. Note that both the\u000f\nand\u0010 vectors are sparse.\nfeature is generated by linearly combining a sparse set of basis vectors\t\nThe bilinear generative model in Equation 2 uses the same set of transformation values\u000e\nfor all the features\u001e\n\n. Such a model is appropriate for global transformations\n\n4.3 Towards Parts-Based Object Recognition\n\n\u001c .\n\n\u0018\u0014\u0018\u0014\u0018\n\n\u0003\u0001\n\n\u0005\n\u0015\n\u0015\n\n\u0005\n\u0015\n\n\u0005\n\u001c\n\u0001\n\u0010\n\u0015\n\u0015\n\u0016\n\fFeature 1 (x57)\n\nFeature 2 (x32)\n\nwi j\n\nwi j\n\nSelected transformations\n\ny(\u22121,+2)\n\ny(0,3)\n\ny(\u22122,0)\n\ny(+1,0)\n\nj =\n\n1\n\n2\n\n3\n\n4\n\n5\n\n6\n\n7\n\n8\n\nj =\n\n1\n\n...\n\n8\n\nFigure 2: Translating a learned feature to multiple locations. The two rows of eight\n\n\u001c values for two\nimages represent the individual basis vectors\t\n\u0015\u0001&\u0018 denotes a translation of\nselected transformations for each\u001e are shown as bar plots.\u000e\n\u0018 pixels in the Cartesian plane. The last column shows the resulting basis vectors after\n\u0015\u0002\n\n\u001c for two values of\u001e . The\u000e\n\ntranslation.\n\n\u0013 '\n\npixels for an image patch or a global\n\nthat apply to an entire image region such as a shift of\nillumination change.\n\nConsider the problem of representing an object in terms of its constituent parts. In this\ncase, we would like to be able to transform each part independently of other parts in order\nto account for the location, orientation, and size of each part in the object image. The\nstandard bilinear model can be extended to address this need as follows:\n\nsummation is thus no longer symmetric. Also note that the standard model (Equation 2) is\n\n\u001f\u0001\nNote that each object feature\u001e now has its own set of transformation values\u000e\na special case of Equation 12 where\u000e\n\nWe have conducted preliminary experiments to test the feasibility of Equation 12 using\na set of object features learned for the standard bilinear model. Fig. 3 shows the results.\nThese results suggest that allowing independent transformations for the different features\nprovides a rich substrate for modeling images and objects in terms of a set of local features\n(or parts) and their individual transformations.\n\n\u0006\u001a\u0007\nfor all\u001e .\n\n\u001c . The double\n\n\u0006\u001a\u0007\n\n(12)\n\n5 Summary and Conclusion\n\nA fundamental problem in vision is to simultaneously recognize objects and their trans-\nformations [8, 10]. Bilinear generative models provide a tractable way of addressing this\nproblem by factoring an image into object features and transformations using a bilinear\nequation. Previous approaches used unconstrained bilinear models and produced global\nbasis vectors for image representation [12]. In contrast, recent research on image coding\nhas stressed the importance of localized, independent features derived from metrics that\nemphasize the higher-order statistics of inputs [6, 3, 7, 5]. This paper introduces a new\nprobabilistic framework for learning bilinear generative models based on the idea of sparse\ncoding.\n\nOur results demonstrate that bilinear sparse coding of natural images produces localized\noriented basis vectors that can simultaneously represent features in an image and their\ntransformation. We showed how the learned generative model can be used to translate a\n\n\u0005\n\u0013\n'\n\u0003\n\u0003\n\u0004\n\u0005\n\u0013\n\u001b\n\u0004\n\u001c\n\t\n\u0005\n\u001c\n\u000e\n\u0005\n\u001c\n\u0018\n\n\u0005\n\u0005\n\u0005\n\u001c\n\u0001\n\u000e\n\u001c\n\f(a)\n\n(b)\n\ny81 \u0001\ny(0,1)\ny57 \u0001\ny(\u22122,0)\n\ny(1,1)\n\ny(1,0)\n\nwy\n81\n\nx\nwy\n57\n\nx81\n\n57\n\n81\n\nx57\n\nz\n\ny(0,1)\n\ny(0,1)\n\n(cid:229) w81\n\nj y81\nj\n\nx81\n\nz\n\ny(\u22122,0)\n\nx57\n\ny(0,1)\n\ny(1,0)\n\ny(1,1)\n\nFigure 3: Modeling independently transformed features. (a) shows the standard bilinear\n\n\u001c values for two different features (\u001e\nmethod of generating a translated feature by combining basis vectors\t\nset of\u000e\nimages generated by allowing different values of\u000e\n\nsigni\ufb01cant differences between the resulting images, which cannot be obtained using the\nstandard bilinear model.\n\n\u0002 and\n\n\u001c using the same\n\u001c for the two different features. Note the\n\u0010 ). (b) shows four examples of\n\nbasis vector to different locations, thereby reducing the need to learn the same basis vector\nat multiple locations as in traditional sparse coding methods. We also proposed an ex-\ntension of the bilinear model that allows each feature to be transformed independently of\nother features. Our preliminary results suggest that such an approach could provide a \ufb02ex-\nible platform for adaptive parts-based object recognition, wherein objects are described by\na set of independent, shared parts and their transformations. The importance of parts-based\nmethods has long been recognized in object recognition in view of their ability to handle\na combinatorially large number of objects by combining parts and their transformations.\nFew methods, if any, exist for learning representations of object parts and their transforma-\ntions directly from images. Our ongoing efforts are therefore focused on deriving ef\ufb01cient\nalgorithms for parts-based object recognition based on the combination of bilinear models\nand sparse coding.\n\n\n\u0005\n\u0001\n\u0015\n\u0003\n\fAcknowledgments\n\nThis research is supported by NSF grant no. 133592 and a Sloan Research Fellowship to\nRPNR.\n\nReferences\n\n[1] F. Attneave. Some informational aspects of visual perception. Psychological Review,\n\n61(3):183\u2013193, 1954.\n\n[2] H. B. Barlow. Possible principles underlying the transformation of sensory messages.\nIn W. A. Rosenblith, editor, Sensory Communication, pages 217\u2013234. Cambridge,\nMA: MIT Press, 1961.\n\n[3] A. J. Bell and T. J. Sejnowski. The \u2018independent components\u2019 of natural scenes are\n\nedge \ufb01lters. Vision Research, 37(23):3327\u20133338, 1997.\n\n[4] G. E. Hinton and Z. Ghahramani. Generative models for discovering sparse dis-\ntributed representations. Philosophical Transactions Royal Society B, 352(1177\u2013\n1190), 1997.\n\n[5] M. S. Lewicki and T. J. Sejnowski. Learning overcomplete representations. Neural\n\nComputation, 12(2):337\u2013365, 2000.\n\n[6] B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive \ufb01eld properties\n\nby learning a sparse code for natural images. Nature, 381:607\u2013609, 1996.\n\n[7] B. A. Olshausen and D. J. Field. Sparse coding with an overcomplete basis set: A\n\nstrategy employed by V1? Vision Research, 37:33113325, 1997.\n\n[8] R. P. N. Rao and D. H. Ballard. Development of localized oriented receptive \ufb01elds\nby learning a translation-invariant code for natural images. Network: Computation in\nNeural Systems, 9(2):219\u2013234, 1998.\n\n[9] R. P. N. Rao and D. H. Ballard. Predictive coding in the visual cortex: A functional\ninterpretation of some extra-classical receptive \ufb01eld effects. Nature Neuroscience,\n2(1):79\u201387, 1999.\n\n[10] R. P. N. Rao and D. L. Ruderman. Learning Lie groups for invariant visual per-\nception. In Advances in Neural Information Processing Systems 11, pages 810\u2013816.\nCambridge, MA: MIT Press, 1999.\n\n[11] O. Schwartz and E. P. Simoncelli. Natural signal statistics and sensory gain control.\n\nNature Neuroscience, 4(8):819\u2013825, August 2001.\n\n[12] J. B. Tenenbaum and W. T. Freeman. Separating style and content with bilinear mod-\n\nels. Neural Computation, 12(6):1247\u20131283, 2000.\n\n\f", "award": [], "sourceid": 2200, "authors": [{"given_name": "David", "family_name": "Grimes", "institution": null}, {"given_name": "Rajesh", "family_name": "Rao", "institution": null}]}