{"title": "Sparse deep belief net model for visual area V2", "book": "Advances in Neural Information Processing Systems", "page_first": 873, "page_last": 880, "abstract": "Motivated in part by the hierarchical organization of cortex, a number of algorithms have recently been proposed that try to learn hierarchical, or ``deep,'' structure from unlabeled data. While several authors have formally or informally compared their algorithms to computations performed in visual area V1 (and the cochlea), little attempt has been made thus far to evaluate these algorithms in terms of their fidelity for mimicking computations at deeper levels in the cortical hierarchy. This paper presents an unsupervised learning model that faithfully mimics certain properties of visual area V2. Specifically, we develop a sparse variant of the deep belief networks of Hinton et al. (2006). We learn two layers of nodes in the network, and demonstrate that the first layer, similar to prior work on sparse coding and ICA, results in localized, oriented, edge filters, similar to the Gabor functions known to model V1 cell receptive fields. Further, the second layer in our model encodes correlations of the first layer responses in the data. Specifically, it picks up both collinear (``contour'') features as well as corners and junctions. More interestingly, in a quantitative comparison, the encoding of these more complex ``corner'' features matches well with the results from the Ito & Komatsu's study of biological V2 responses. This suggests that our sparse variant of deep belief networks holds promise for modeling more higher-order features.", "full_text": "Sparse deep belief net model for visual area V2\n\nChaitanya Ekanadham\n\nComputer Science Department\n\nAndrew Y. Ng\n\nHonglak Lee\n\nStanford University\nStanford, CA 94305\n\n{hllee,chaitu,ang}@cs.stanford.edu\n\nAbstract\n\nMotivated in part by the hierarchical organization of the cortex, a number of al-\ngorithms have recently been proposed that try to learn hierarchical, or \u201cdeep,\u201d\nstructure from unlabeled data. While several authors have formally or informally\ncompared their algorithms to computations performed in visual area V1 (and the\ncochlea), little attempt has been made thus far to evaluate these algorithms in terms\nof their \ufb01delity for mimicking computations at deeper levels in the cortical hier-\narchy. This paper presents an unsupervised learning model that faithfully mimics\ncertain properties of visual area V2. Speci\ufb01cally, we develop a sparse variant of\nthe deep belief networks of Hinton et al. (2006). We learn two layers of nodes in\nthe network, and demonstrate that the \ufb01rst layer, similar to prior work on sparse\ncoding and ICA, results in localized, oriented, edge \ufb01lters, similar to the Gabor\nfunctions known to model V1 cell receptive \ufb01elds. Further, the second layer in our\nmodel encodes correlations of the \ufb01rst layer responses in the data. Speci\ufb01cally, it\npicks up both colinear (\u201ccontour\u201d) features as well as corners and junctions. More\ninterestingly, in a quantitative comparison, the encoding of these more complex\n\u201ccorner\u201d features matches well with the results from the Ito & Komatsu\u2019s study\nof biological V2 responses. This suggests that our sparse variant of deep belief\nnetworks holds promise for modeling more higher-order features.\n\n1 Introduction\nThe last few years have seen signi\ufb01cant interest in \u201cdeep\u201d learning algorithms that learn layered,\nhierarchical representations of high-dimensional data. [1, 2, 3, 4]. Much of this work appears to\nhave been motivated by the hierarchical organization of the cortex, and indeed authors frequently\ncompare their algorithms\u2019 output to the oriented simple cell receptive \ufb01elds found in visual area V1.\n(E.g., [5, 6, 2]) Indeed, some of these models are often viewed as \ufb01rst attempts to elucidate what\nlearning algorithm (if any) the cortex may be using to model natural image statistics.\nHowever, to our knowledge no serious attempt has been made to directly relate, such as through\nquantitative comparisons, the computations of these deep learning algorithms to areas deeper in the\ncortical hierarchy, such as to visual areas V2, V4, etc. In this paper, we develop a sparse variant\nof Hinton\u2019s deep belief network algorithm, and measure the degree to which it faithfully mimics\nbiological measurements of V2. Speci\ufb01cally, we take Ito & Komatsu [7]\u2019s characterization of V2 in\nterms of its responses to a large class of angled bar stimuli, and quantitatively measure the degree to\nwhich the deep belief network algorithm generates similar responses.\nDeep architectures attempt to learn hierarchical structure, and hold the promise of being able to\n\ufb01rst learn simple concepts, and then successfully build up more complex concepts by composing\ntogether the simpler ones. For example, Hinton et al. [1] proposed an algorithm based on learning\nindividual layers of a hierarchical probabilistic graphical model from the bottom up. Bengio et al. [3]\nproposed a similarly greedy algorithm, one based on autoencoders. Ranzato et al. [2] developed an\nenergy-based hierarchical algorithm, based on a sequence of sparsi\ufb01ed autoencoders/decoders.\n\n1\n\n\fIn related work, several studies have compared models such as these, as well as non-\nhierarchical/non-deep learning algorithms, to the response properties of neurons in area V1. A study\nby van Hateren and van der Schaaf [8] showed that the \ufb01lters learned by independent components\nanalysis (ICA) [9] on natural image data match very well with the classical receptive \ufb01elds of V1\nsimple cells. (Filters learned by sparse coding [10, 11] also similarly give responses similar to V1\nsimple cells.) Our work takes inspiration from the work of van Hateren and van der Schaaf, and\nrepresents a study that is done in a similar spirit, only extending the comparisons to a deeper area in\nthe cortical hierarchy, namely visual area V2.\n2 Biological comparison\n2.1 Features in early visual cortex: area V1\nThe selectivity of neurons for oriented bar stimuli in cortical area V1 has been well documented [12,\n13]. The receptive \ufb01eld of simple cells in V1 are localized, oriented, bandpass \ufb01lters that resemble\ngabor \ufb01lters. Several authors have proposed models that have been either formally or informally\nshown to replicate the gabor-like properties of V1 simple cells. Many of these algorithms, such\nas [10, 9, 8, 6], compute a (approximately or exactly) sparse representation of the natural stimuli\ndata. These results are consistent with the \u201cef\ufb01cient coding hypothesis\u201d which posits that the goal\nof early visual processing is to encode visual information as ef\ufb01ciently as possible [14]. Some\nhierarchical extensions of these models [15, 6, 16] are able to learn features that are more complex\nthan simple oriented bars. For example, hierarchical sparse models of natural images have accounted\nfor complex cell receptive \ufb01elds [17], topography [18, 6], colinearity and contour coding [19]. Other\nmodels, such as [20], have also been shown to give V1 complex cell-like properties.\n2.2 Features in visual cortex area V2\nIt remains unknown to what extent the previously described algorithms can learn higher order fea-\ntures that are known to be encoded further down the ventral visual pathway. In addition, the response\nproperties of neurons in cortical areas receiving projections from area V1 (e.g., area V2) are not\nnearly as well documented. It is uncertain what type of stimuli cause V2 neurons to respond opti-\nmally [21]. One V2 study by [22] reported that the receptive \ufb01elds in this area were similar to those\nin the neighboring areas V1 and V4. The authors interpreted their \ufb01ndings as suggestive that area\nV2 may serve as a place where different channels of visual information are integrated. However,\nquantitative accounts of responses in area V2 are few in number. In the literature, we identi\ufb01ed two\nsets of quantitative data that give us a good starting point for making measurements to determine\nwhether our algorithms may be computing similar functions as area V2.\nIn one of these studies, Ito and Komatsu [7] investigated how V2 neurons responded to angular stim-\nuli. They summarized each neuron\u2019s response with a two-dimensional visualization of the stimuli\nset called an angle pro\ufb01le. By making several axial measurements within the pro\ufb01le, the authors\nwere able to compute various statistics about each neuron\u2019s selectivity for angle width, angle ori-\nentation, and for each separate line component of the angle (see Figure 1). Approximately 80% of\nthe neurons responded to speci\ufb01c angle stimuli. They found neurons that were selective for only\none line component of its peak angle as well as neurons selective for both line components. These\nneurons yielded angle pro\ufb01les resembling those of Cell 2 and Cell 5 in Figure 1, respectively. In\naddition, several neurons exhibited a high amount of selectivity for its peak angle producing angle\npro\ufb01les like that of Cell 1 in Figure 1. No neurons were found that had more elongation in a di-\nagonal axis than in the horizontal or vertical axes, indicating that neurons in V2 were not selective\nfor angle width or orientation. Therefore, an important conclusion made from [7] was that a V2\nneuron\u2019s response to an angle stimulus is highly dependent on its responses to each individual line\ncomponent of the angle. While the dependence was often observed to be simply additive, as was\nthe case with neurons yielding pro\ufb01les like those of Cells 1 and 2 in Figure 1(right), this was not\nalways the case. 29 neurons had very small peak response areas and yielded pro\ufb01les like that of Cell\n1 in Figure 1(right), thus indicating a highly speci\ufb01c tuning to an angle stimulus. While the former\nresponses suggest a simple linear computation of V1 neural responses, the latter responses suggest\na nonlinear computation [21]. The analysis methods adopted in [7] are very useful in characterizing\nthe response properties, and we use these methods to evaluate our own model.\nAnother study by Hegde and Van Essen [23] studied the responses of a population of V2 neurons\nto complex contour and grating stimuli. They found several V2 neurons responding maximally for\nangles, and the distribution of peak angles for these neurons is consistent with that found by [7]. In\naddition, several V2 neurons responded maximally for shapes such as intersections, tri-stars, \ufb01ve-\npoint stars, circles, and arcs of varying length.\n\n2\n\n\fFigure 1: (Images from [7]; courtesy of Ito and Komatsu) Left: Visualization of angle pro\ufb01les. The upper-right\nand lower-left triangles contain the same stimuli. (A,B) Darkened squares correspond to stimuli that elicited a\nlarge response. The peak responses are circled. (C) The arrangement of the \ufb01gure is so that one line component\nremains constant as one moves along any vertical or horizontal axis. (D) The angles width remains constant\nas one moves along a the diagonal indicated (E) The angle orientation remains constant as one moves along\nthe diagonal indicated. After identifying the optimal stimuli for a neuron in the pro\ufb01le, the number of stimuli\nalong these various axes (as in C,D,E) eliciting responses larger than 80% of the peak response measure the\nneuron\u2019s tolerance to perturbations to the line components, peak angle width, and orientation, respectively.\nRight: Examples of 4 typical angle pro\ufb01les. As before, stimuli eliciting large responses are highlighted. Cell 1\nhas a selective response to a stimulus, so there is no elongation along any axis. Cell 2 has one axis of elongation,\nindicating selectivity for one orientation. Cell 5 has two axes of elongation, and responds strongly so long as\neither of two edge orientations is present. Cell 4 has no clear axis of elongation.\n3 Algorithm\nHinton et al. [1] proposed an algorithm for learning deep belief networks, by treating each layer as a\nrestricted Boltzmann machine (RBM) and greedily training the network one layer at a time from the\nbottom up [24, 1]. In general, however, RBMs tend to learn distributed, non-sparse representations.\nBased on results from other methods (e.g., sparse coding [10, 11], ICA [9], heavy-tailed models [6],\nand energy based models [2]), sparseness seems to play a key role in learning gabor-like \ufb01lters.\nTherefore, we modify Hinton et al.\u2019s learning algorithm to enable deep belief nets to learn sparse\nrepresentations.\n3.1 Sparse restricted Boltzmann machines\nWe begin by describing the restricted Boltzmann machine (RBM), and present a modi\ufb01ed version of\nit. An RBM has a set of hidden units h, a set of visible units v, and symmetric connections weights\nbetween these two layers represented by a weight matrix W . Suppose that we want to model k\ndimensional real-valued data using an undirected graphical model with n binary hidden units. The\nnegative log probability of any state in the RBM is given by the following energy function:1\n\n\u2212 log P (v, h) = E(v, h) =\n\n1\n2\u03c32\n\ni \u2212 1\nv2\n\u03c32\n\nX\n\ni\n\n\uf8eb\uf8edX\n\ncivi +X\n\nbjhj +X\n\ni\n\nj\n\ni,j\n\n\uf8f6\uf8f8 .\n\nviwijhj\n\n(1)\n\nHere, \u03c3 is a parameter, hj are hidden unit variables, vi are visible unit variables. Informally, the\nmaximum likelihood parameter estimation problem corresponds to learning wij, ci and bj so as to\nminimize the energy of states drawn from the data distribution, and raise the energy of states that\nare improbable given the data.\nUnder this model, we can easily compute the conditional probability distributions. Holding either h\nor v \ufb01xed, we can sample from the other as follows:\n\nP (vi|h) = N(cid:16)\nj wijhj, \u03c32(cid:17)\nci +P\nP (hj|v) = logistic(cid:0) 1\n\u03c32 (bj +P\nj bjhj +P\ni civi +P\n\n,\n\ni wijvi)(cid:1) .\n\n3\n\n1Due to space constraints, we present an energy function only for the case of real-valued visible units. It is\nalso straightforward to formulate a sparse RBM with binary-valued visible units; for example, we can write the\n\nenergy function as E(v, h) = \u22121/\u03c32(P\n\ni,j viwijhj) (see also [24]).\n\n(2)\n\n(3)\n\n\fl=1 logP\n\nh P (v(l), h(l)) + \u03bbPn\n\nminimize{wij ,ci,bj} \u2212Pm\n\nHere, N (\u00b7) is the gaussian density, and logistic(\u00b7) is the logistic function.\nFor training the parameters of the model, the objective is to maximize the log-likelihood of the data.\nWe also want hidden unit activations to be sparse; thus, we add a regularization term that penalizes\na deviation of the expected activation of the hidden units from a (low) \ufb01xed level p.2 Thus, given a\ntraining set {v(1), . . . , v(m)} comprising m examples, we pose the following optimization problem:\n(4)\nwhere E[\u00b7] is the conditional expectation given the data, \u03bb is a regularization constant, and p is\na constant controlling the sparseness of the hidden units hj. Thus, our objective is the sum of a\nlog-likelihood term and a regularization term. In principle, we can apply gradient descent to this\nproblem; however, computing the gradient of the log-likelihood term is expensive. Fortunately, the\ncontrastive divergence learning algorithm gives an ef\ufb01cient approximation to the gradient of the log-\nlikelihood [25]. Building upon this, on each iteration we can apply the contrastive divergence update\nrule, followed by one step of gradient descent using the gradient of the regularization term.3 The\ndetails of our procedure are summarized in Algorithm 1.\nAlgorithm 1 Sparse RBM learning algorithm\n\nE[h(l)\n\nj |v(l)]|2,\n\nj=1 | p \u2212 1\n\nm\n\nPm\n\nl=1\n\n1. Update the parameters using contrastive divergence learning rule. More speci\ufb01cally,\n\nwij := wij + \u03b1(hvihjidata \u2212 hvihjirecon)\nci := ci + \u03b1(hviidata \u2212 hviirecon)\nbj := bj + \u03b1(hbjidata \u2212 hbjirecon),\n\nwhere \u03b1 is a learning rate, and h\u00b7irecon is an expectation over the reconstruction data, estimated\nusing one iteration of Gibbs sampling (as in Equations 2,3).\n2. Update the parameters using the gradient of the regularization term.\n3. Repeat Steps 1 and 2 until convergence.\n\n3.2 Learning deep networks using sparse RBM\nOnce a layer of the network is trained, the parameters wij, bj, ci\u2019s are frozen and the hidden unit\nvalues given the data are inferred. These inferred values serve as the \u201cdata\u201d used to train the next\nhigher layer in the network. Hinton et al. [1] showed that by repeatedly applying such a procedure,\none can learn a multilayered deep belief network. In some cases, this iterative \u201cgreedy\u201d algorithm\ncan further be shown to be optimizing a variational bound on the data likelihood, if each layer has\nat least as many units as the layer below (although in practice this is not necessary to arrive at a\ndesirable solution; see [1] for a detailed discussion). In our experiments using natural images, we\nlearn a network with two hidden layers, with each layer learned using the sparse RBM algorithm\ndescribed in Section 3.1.\n4 Visualization\n4.1 Learning \u201cstrokes\u201d from handwritten digits\nWe applied the sparse RBM algorithm to the MNIST\nhandwritten digit dataset.4 We learned a sparse RBM\nwith 69 visible units and 200 hidden units. The learned\nbases are shown in Figure 2. (Each basis corresponds to\none column of the weight matrix W left-multiplied by\nthe unwhitening matrix.) Many bases found by the al-\ngorithm roughly represent different \u201cstrokes\u201d of which\nhandwritten digits are comprised. This is consistent\n\nFigure 2: Bases learned from MNIST data\n\n2Less formally, this regularization ensures that the \u201c\ufb01ring rate\u201d of the model neurons (corresponding to the\nlatent random variables hj) are kept at a certain (fairly low) level, so that the activations of the model neurons\nare sparse. Similar intuition was also used in other models (e.g., see Olshausen and Field [10]).\n\n3To increase computational ef\ufb01ciency, we made one additional change. Note that the regularization term is\nde\ufb01ned using a sum over the entire training set; if we use stochastic gradient descent or mini-batches (small\nsubsets of the training data) to estimate this term, it results in biased estimates of the gradient. To ameliorate\nthis, we used mini-batches, but in the gradient step that tries to minimize the regularization term, we update\nonly the bias terms bj\u2019s (which directly control the degree to which the hidden units are activated, and thus their\nsparsity), instead of updating all the parameters bj and wij\u2019s.\n\n4Downloaded from http://yann.lecun.com/exdb/mnist/. Each pixel was normalized to the\nunit interval, and we used PCA whitening to reduce the dimension to 69 principal components for computational\nef\ufb01ciency. (Similar results were obtained without whitening.)\n\n4\n\n\fFigure 3: 400 \ufb01rst layer bases learned from the van Hateren natural image dataset, using our algorithm.\n\nFigure 4: Visualization of 200 second layer bases (model V2 receptive \ufb01elds), learned from natural images.\nEach small group of 3-5 (arranged in a row) images shows one model V2 unit; the leftmost patch in the group\nis a visualization of the model V2 basis, and is obtained by taking a weighted linear combination of the \ufb01rst\nlayer \u201cV1\u201d bases to which it is connected. The next few patches in the group show the \ufb01rst layer bases that\nhave the strongest weight connection to the model V2 basis.\n\nwith results obtained by applying different algorithms to learn sparse representations of this data\nset (e.g., [2, 5]).\n4.2 Learning from natural images\nWe also applied the algorithm to a training set a set of 14-by-14 natural image patches, taken from\na dataset compiled by van Hateren.5 We learned a sparse RBM model with 196 visible units and\n400 hidden units. The learned bases are shown in Figure 3; they are oriented, gabor-like bases and\nresemble the receptive \ufb01elds of V1 simple cells.6\n4.3 Learning a two-layer model of natural images using sparse RBMs\nWe further learned a two-layer network by stacking one sparse RBM on top of another (see Sec-\ntion 3.2 for details.)7 After learning, the second layer weights were quite sparse\u2014most of the\nweights were very small, and only a few were either highly positive or highly negative. Positive\n\n5The images were obtained from http://hlab.phys.rug.nl/imlib/index.html. We used\n100,000 14-by-14 image patches randomly sampled from an ensemble of 2000 images; each subset of 200\npatches was used as a mini-batch.\n\n6Most other authors\u2019 experiments to date using regular (non-sparse) RBMs, when trained on such data,\nseem to have learned relatively diffuse, unlocalized bases (ones that do not represent oriented edge \ufb01lters).\nWhile sensitive to the parameter settings and requiring a long training time, we found that it is possible in\nsome cases to get a regular RBM to learn oriented edge \ufb01lter bases as well. But in our experiments, even in\nthese cases we found that repeating this process to build a two layer deep belief net (see Section 4.3) did not\nencode a signi\ufb01cant number of corners/angles, unlike one trained using the sparse RBM; therefore, it showed\nsigni\ufb01cantly worse match to the Ito & Komatsu statistics. For example, the fraction of model V2 neurons that\nrespond strongly to a pair of edges near right angles (formally, have peak angle in the range 60-120 degrees)\nwas 2% for the regular RBM, whereas it was 17% for the sparse RBM (and Ito & Komatsu reported 22%). See\nSection 5.1 for more details.\n\n7For the results reported in this paper, we trained the second layer sparse RBM with real-valued visible\nunits; however, the results were very similar when we trained the second layer sparse RBM with binary-valued\nvisible units (except that the second layer weights became less sparse).\n\n5\n\n\fFigure 5: Top: Visualization of four learned model V2 neurons. (Visualization in each row of four or \ufb01ve\npatches follows format in Figure 4.) Bottom: Angle stimulus response pro\ufb01le for model V2 neurons in the top\nrow. The 36*36 grid of stimuli follows [7], in which the orientation of two lines are varied to form different\nangles. As in Figure 1, darkened patches represent stimuli to which the model V2 neuron responds strongly;\nalso, a small black square indicates the overall peak response.\n\nweights represent excitatory connections between model V1 and model V2 units, whereas negative\nelements represent inhibitory connections. By visualizing the second layer bases as shown in Fig-\nure 4, we observed bases that encoded co-linear \ufb01rst layer bases as well as edge junctions. This\nshows that by extending the sparse RBM to two layers and using greedy learning, the model is able\nto learn bases that encode contours, angles, and junctions of edges.\n5 Evaluation experiments\nWe now more quantitatively compare the algorithm\u2019s learned responses to biological measure-\nments.8\n5.1 Method: Ito-Komatsu paper protocol\nWe now describe the procedure we used to compare our model with the experimental data in [7]. We\ngenerated a stimulus set consisting of the same set of angles (pairs of edges) as [7]. To identify the\n\u201ccenter\u201d of each model neuron\u2019s receptive \ufb01eld, we translate all stimuli densely over the 14x14 input\nimage patch, and identify the position at which the maximum response is elicited. All measures are\nthen taken with all angle stimuli centered at this position.9\nUsing these stimuli, we compute the hidden unit probabilities from our model V1 and V2 neurons.\nIn other words, for each stimulus we compute the \ufb01rst hidden layer activation probabilities, then\nfeed this probability as data to the second hidden layer and compute the activation probabilities\nagain in the same manner. Following a protocol similar to [7], we also eliminate from consideration\nthe model neurons that do not respond strongly to corners and edges.10 Some representative results\nare shown in Figure 5. (The four angle pro\ufb01les shown are fairly typical of those obtained in our\nexperiments.) We see that all the V2 bases in Figure 5 have maximal response when its strongest\nV1-basis components are aligned with the stimulus. Thus, some of these bases do indeed seem to\nencode edge junctions or crossings.\nWe also compute similar summary statistics as [7] (described in Figure 1(C,D,E)), that more quanti-\ntatively measure the distribution of V2 or model V2 responses to the different angle stimuli. Figure 6\nplots the responses of our model, together with V2 data taken from [7]. Along many dimensions,\nthe results from our model match that from the Macaque V2 fairly well.\n\n8The results we report below were very insensitive to the choices of \u03c3 and \u03bb. We set \u03c3 to 0.4 and 0.05\nfor the \ufb01rst and second layers (chosen to be on the same scale as the standard deviation of the data and the\n\ufb01rst-layer activations), and \u03bb = 1/p in each layer. We used p = 0.02 and 0.05 for the \ufb01rst and second layers.\n9Other details: The stimulus set is created by generating a binary-mask image, that is then scaled to nor-\nmalize contrast. To determine this scaling constant, we used single bar images by translating and rotating to\nall possible positions, and \ufb01xed the constant such that the top 0.5% (over all translations and rotations) of the\nstimuli activate the model V1 cells above 0.5. This normalization step corrects for the RBM having been trained\non a data distribution (natural images) that had very different contrast ranges than our test stimulus set.\n\n10In detail, we generated a set of random low-frequency stimulus, by generating small random KxK\n(K=2,3,4) images with each pixel drawn from a standard normal distribution, and rescaled the image using\nbicubic interpolation to 14x14 patches. These stimulus are scaled such that about 5% of the V2 bases \ufb01res\nmaximally to these random stimuli. We then exclude the V2 bases that are maximally activated to these ran-\ndom stimuli from the subsequent analysis.\n\n6\n\n\fFigure 6: Images show distributions over stimulus response statistics (averaged over 10 trials) from our algo-\nrithm (blue) and in data taken from [7] (green). The \ufb01ve \ufb01gures show respectively (i) the distribution over peak\nangle response (ranging from 0 to 180 degrees; each bin represents a range of 30 degrees), (ii) distribution over\ntolerance to primary line component (Figure 1C, in dominant vertical or horizontal direction), (iii) distribution\nover tolerance to secondary line component (Figure 1C, in non-dominant direction), (iv) tolerance to angle\nwidth (Figure 1D), (v) tolerance to angle orientation (Figure 1E). See Figure 1 caption, and [7], for details.\n\nFigure 7: Visualization of a number of model V2 neurons that maximally respond to various complex stimuli.\nEach row of seven images represents one V2 basis. In each row, the leftmost image shows a linear combination\nof the top three weighted V1 components that comprise the V2 basis; the next three images show the top three\noptimal stiimuli; and the last three images show the top three weighted V1 bases. The V2 bases shown in the\n\ufb01gures maximally respond to acute angles (left), obtuse angles (middle), and tri-stars and junctions (right).\n5.2 Complex shaped model V2 neurons\nOur second experiment represents a comparison to a subset of the results described in Hegde and van\nEssen [23]. We generated a stimulus set comprising some [23]\u2019s complex shaped stimuli: angles,\nsingle bars, tri-stars (three line segments that meet at a point), and arcs/circles, and measured the\nresponse of the second layer of our sparse RBM model to these stimuli.11 We observe that many V2\nbases are activated mainly by one of these different stimulus classes. For example, some model V2\nneurons activate maximally to single bars; some maximally activate to (acute or obtuse) angles; and\nothers to tri-stars (see Figure 7). Further, the number of V2 bases that are maximally activated by\nacute angles is signi\ufb01cantly larger than the number of obtuse angles, and the number of V2 bases\nthat respond maximally to the tri-stars was much smaller than both preceding cases. This is also\nconsistent with the results described in [23].\n6 Conclusions\nWe presented a sparse variant of the deep belief network model. When trained on natural images,\nthis model learns local, oriented, edge \ufb01lters in the \ufb01rst layer. More interestingly, the second layer\ncaptures a variety of both colinear (\u201ccontour\u201d) features as well as corners and junctions, that in a\nquantitative comparison to measurements of V2 taken by Ito & Komatsu, appeared to give responses\nthat were similar along several dimensions. This by no means indicates that the cortex is a sparse\nRBM, but perhaps is more suggestive of contours, corners and junctions being fundamental to the\nstatistics of natural images.12 Nonetheless, we believe that these results also suggest that sparse\n\n11All the stimuli were 14-by-14 pixel image patches. We applied the protocol described in Section 5.1 to the\n\nstimulus data, to compute the model V1 and V2 responses.\n\n12In preliminary experiments, we also found that when these ideas are applied to self-taught learning [26] (in\nwhich one may use unlabeled data to identify features that are then useful for some supervised learning task),\nusing a two-layer sparse RBM usually results in signi\ufb01cantly better features for object recognition than using\nonly a one-layer network.\n\n7\n\n15457510513516500.10.20.30.40.5peak angles  sparse DBNIto & Komatsu123456789101100.050.10.150.2primary line axis  sparse DBNIto & Komatsu123456789101100.10.20.30.40.5secondary line axis  sparse DBNIto & Komatsu123456789101100.20.40.60.8angle width axis  sparse DBNIto & Komatsu123456789101100.20.40.60.81angle orientation axis  sparse DBNIto & Komatsu\fdeep learning algorithms, such as our sparse variant of deep belief nets, hold promise for modeling\nhigher-order features such as might be computed in the ventral visual pathway in the cortex.\nAcknowledgments\nWe give warm thanks to Minami Ito, Geoffrey Hinton, Chris Williams, Rajat Raina, Narut Sereewat-\ntanawoot, and Austin Shoemaker for helpful discussions. Support from the Of\ufb01ce of Naval Research\nunder MURI N000140710747 is gratefully acknowledged.\nReferences\n[1] G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural Compu-\n\ntation, 18(7):1527\u20131554, 2006.\n\n[2] M. Ranzato, C. Poultney, S. Chopra, and Y. LeCun. Ef\ufb01cient learning of sparse representations with an\n\nenergy-based model. In NIPS, 2006.\n\n[3] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In\n\nNIPS, 2006.\n\n[4] H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep\n\narchitectures on problems with many factors of variation. In ICML, 2007.\n\n[5] G. E. Hinton, S. Osindero, and K. Bao. Learning causally linked MRFs. In AISTATS, 2005.\n[6] S. Osindero, M. Welling, and G. E. Hinton. Topographic product models applied to natural scene statistics.\n\nNeural Computation, 18:381\u2013344, 2006.\n\n[7] M. Ito and H. Komatsu. Representation of angles embedded within contour stimuli in area v2 of macaque\n\nmonkeys. The Journal of Neuroscience, 24(13):3313\u20133324, 2004.\n\n[8] J. H. van Hateren and A. van der Schaaf. Independent component \ufb01lters of natural images compared with\n\nsimple cells in primary visual cortex. Proc.R.Soc.Lond. B, 265:359\u2013366, 1998.\n\n[9] A. J. Bell and T. J. Sejnowski. The \u2018independent components\u2019 of natural scenes are edge \ufb01lters. Vision\n\nResearch, 37(23):3327\u20133338, 1997.\n\n[10] B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive \ufb01eld properties by learning a sparse\n\ncode for natural images. Nature, 381:607\u2013609, 1996.\n\n[11] H. Lee, , A. Battle, R. Raina, and A. Y. Ng. Ef\ufb01cient sparse coding algorithms. In NIPS, 2007.\n[12] D. Hubel and T. Wiesel. Receptive \ufb01elds and functional architecture of monkey striate cortex. Journal of\n\nPhysiology, 195:215\u2013243, 1968.\n\n[13] R. L. DeValois, E. W. Yund, and N. Hepler. The orientation and direction selectivity of cells in macaque\n\nvisual cortex. Vision Res., 22:531\u2013544, 1982a.\n\n[14] H. B. Barlow. The coding of sensory messages. Current Problems in Animal Behavior, 1961.\n[15] P. O. Hoyer and A. Hyvarinen. A multi-layer sparse coding network learns contour coding from natural\n\nimages. Vision Research, 42(12):1593\u20131605, 2002.\n\n[16] Y. Karklin and M. S. Lewicki. A hierarchical bayesian model for learning non-linear statistical regularities\n\nin non-stationary natural signals. Neural Computation, 17(2):397\u2013423, 2005.\n\n[17] A. Hyvarinen and P. O. Hoyer. Emergence of phase and shift invariant features by decomposition of\n\nnatural images into independent feature subspaces. Neural Computation, 12(7):1705\u20131720, 2000.\n\n[18] A. Hyv\u00a8arinen, P. O. Hoyer, and M. O. Inki. Topographic independent component analysis. Neural\n\nComputation, 13(7):1527\u20131558, 2001.\n\n[19] A. Hyvarinen, M. Gutmann, and P. O. Hoyer. Statistical model of natural stimuli predicts edge-like\n\npooling of spatial frequency channels in v2. BMC Neuroscience, 6:12, 2005.\n\n[20] L. Wiskott and T. Sejnowski. Slow feature analysis: Unsupervised learning of invariances. Neural Com-\n\nputation, 14(4):715\u2013770, 2002.\n\n[21] G. Boynton and J. Hegde. Visual cortex: The continuing puzzle of area v2. Current Biology,\n\n14(13):R523\u2013R524, 2004.\n\n[22] J. B. Levitt, D. C. Kiper, and J. A. Movshon. Receptive \ufb01elds and functional architecture of macaque v2.\n\nJournal of Neurophysiology, 71(6):2517\u20132542, 1994.\n\n[23] J. Hegde and D.C. Van Essen. Selectivity for complex shapes in primate visual area v2. Journal of\n\nNeuroscience, 20:RC61\u201366, 2000.\n\n[24] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science,\n\n313(5786):504\u2013507, 2006.\n\n[25] G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation,\n\n14:1771\u20131800, 2002.\n\n[26] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng. Self-taught learning: Transfer learning from\n\nunlabeled data. In ICML, 2007.\n\n8\n\n\f", "award": [], "sourceid": 934, "authors": [{"given_name": "Honglak", "family_name": "Lee", "institution": null}, {"given_name": "Chaitanya", "family_name": "Ekanadham", "institution": null}, {"given_name": "Andrew", "family_name": "Ng", "institution": null}]}