{"title": "The Conjoint Effect of Divisive Normalization and Orientation Selectivity on Redundancy Reduction", "book": "Advances in Neural Information Processing Systems", "page_first": 1521, "page_last": 1528, "abstract": "Bandpass filtering, orientation selectivity, and contrast gain control are prominent features of sensory coding at the level of V1 simple cells. While the effect of bandpass filtering and orientation selectivity can be assessed within a linear model, contrast gain control is an inherently nonlinear computation. Here we employ the class of $L_p$ elliptically contoured distributions to investigate the extent to which the two features---orientation selectivity and contrast gain control---are suited to model the statistics of natural images. Within this framework we find that contrast gain control can play a significant role for the removal of redundancies in natural images. Orientation selectivity, in contrast, has only a very limited potential for redundancy reduction.", "full_text": "The Conjoint Effect of Divisive Normalization and\nOrientation Selectivity on Redundancy Reduction in\n\nNatural Images\n\nFabian Sinz\n\nMPI for Biological Cybernetics\n\n72076 T\u00a8ubingen, Germany\n\nfabee@tuebingen.mpg.de\n\nMatthias Bethge\n\nMPI for Biological Cybernetics\n\n72076 T\u00a8ubingen, Germany\n\nmbethge@tuebingen.mpg.de\n\nAbstract\n\nBandpass \ufb01ltering, orientation selectivity, and contrast gain control are prominent\nfeatures of sensory coding at the level of V1 simple cells. While the effect of\nbandpass \ufb01ltering and orientation selectivity can be assessed within a linear model,\ncontrast gain control is an inherently nonlinear computation. Here we employ the\nclass of Lp elliptically contoured distributions to investigate the extent to which\nthe two features\u2014orientation selectivity and contrast gain control\u2014are suited to\nmodel the statistics of natural images. Within this framework we \ufb01nd that contrast\ngain control can play a signi\ufb01cant role for the removal of redundancies in natural\nimages. Orientation selectivity, in contrast, has only a very limited potential for\nredundancy reduction.\n\n1 Introduction\n\nIt is a long standing hypothesis that sensory systems are adapted to the statistics of their inputs.\nThese natural signals are by no means random, but exhibit plenty of regularities. Motivated by\ninformation theoretic principles, Attneave and Barlow suggested that one important purpose of this\nadaptation in sensory coding is to model and reduce the redundancies [4; 3] by transforming the\nsignal into a statistically independent representation.\nThe problem of redundancy reduction can be split into two parts: (i) \ufb01nding a good statistical model\nof the natural signals and (ii) a way to map them into a factorial representation. The \ufb01rst part\nis relevant not only to the study of biological systems, but also to technical applications such as\ncompression and denoising. The second part offers a way to link neural response properties to\ncomputational principles, since neural representations of natural signals must be advantageous in\nterms of redundancy reduction if the hypothesis were true. Both aspects have been extensively\nstudied for natural images [2; 5; 8; 19; 20; 21; 24]. In particular, it has been shown that applying\nIndependent Component Analysis (ICA) to natural images consistently and robustly yields \ufb01lters\nthat are localized, oriented and show bandpass characteristics [19; 5]. Since those features are also\nascribed to the receptive \ufb01elds of neurons in the primary visual cortex (V1), it has been suggested\nthat the receptive \ufb01elds of V1 neurons are shaped to form a minimally redundant representation of\nnatural images [5; 19].\nFrom a redundancy reduction point of view, ICA offers a small but signi\ufb01cant advantage over other\nlinear representations [6]. In terms of density estimation, however, it is a poor model for natural\nimages since already a simple non-factorial spherically symmetric model yields a much better \ufb01t to\nthe data [10].\nRecently, Lyu and Simoncelli proposed a method that converts any spherically symmetric distribu-\ntion into a (factorial) Gaussian (or Normal distribution) by using a non-linear transformation of the\n\n1\n\n\fnorm of the image patches [17]. This yields a non-linear redundancy reduction mechanism, which\nexploits the superiority of the spherically symmetric model over ICA. Interestingly, the non-linearity\nof this Radial Gaussianization method closely resembles another feature of the early visual system,\nknown as contrast gain control [13] or divisive normalization [20]. However, since spherically sym-\nmetric models are invariant under orthogonal transformations, they are agnostic to the particular\nchoice of basis in the whitened space. Thus, there is no role for the shape of the \ufb01lters in this model.\nCombining the observations from the two models of natural images, we can draw two conclusions:\nOn the one hand, ICA is not a good model for natural images, because a simple spherically sym-\nmetric model yields a much better \ufb01t [10]. On the other hand, the spherically symmetric model in\nRadial Gaussianization cannot capture that ICA \ufb01lters do yield a higher redundancy reduction than\nother linear transformations. This leaves us with the questions whether we can understand the emer-\ngence of oriented \ufb01lters in a more general redundancy reduction framework, which also includes a\nmechanism for contrast gain control.\nIn this work we address this question by using the more general class of Lp-spherically symmetric\nmodels [23; 12; 15]. These models are quite similar to spherically symmetric models, but do depend\non the particular shape of the linear \ufb01lters. Just like spherically symmetric models can be non-\nlinearly transformed into isotropic Gaussians, Lp-spherically symmetric models can be mapped into\na unique class of factorial distributions, called p-generalized Normal distributions [11]. Thus, we\nare able to quantify the in\ufb02uence of orientation selective \ufb01lters and contrast gain control on the\nredundancy reduction of natural images in a joint model.\n\n2 Models and Methods\n\n2.1 Decorrelation and Filters\n\nAll probabilistic models in this paper are de\ufb01ned on whitened natural images. Let C be the co-\nvariance matrix of the pixel intensities for an ensemble x1, ..., xm of image patches, then C\u2212 1\n2\nconstitutes the symmetric whitening transform. Note that all vectors y = V C\u2212 1\n2 x, with V being\nan orthogonal matrix, have unit covariance. V C\u2212 1\n2 yield the linear \ufb01lters that are applied to the raw\nimage patches before feeding them in the probabilistic models described below. Since any decorre-\nlation transform can be written as V C\u2212 1\n2 , the choice of V determines the shape of the linear \ufb01lters.\nIn our experiments, we use three different kinds of V :\nSYM The simplest choice is VSYM = I, i. e. y = C\u2212 1\n2 x contains the coef\ufb01cients in the symmetric\nwhitening basis. From a biological perspective, this case is interesting as the \ufb01lters resemble recep-\ntive \ufb01elds of retinal ganglion cells with center-surround properties.\nICA The \ufb01lters VICA of ICA are determined by maximizing the non-Gaussanity of the marginal\ndistributions. For natural image patches, ICA is known to yield orientation selective \ufb01lters in resem-\nblance to V1 simple cells. While other orientation selective bases are possible, the \ufb01lters de\ufb01ned\nby VICA correspond to the optimal choice for redundancy reduction under the restriction to linear\nmodels.\nHAD The coef\ufb01cients in the basis VHAD = 1\u221a\nm HVICA, with H denoting an arbitrary Hadamard\nmatrix, correspond to a sum over the different ICA coef\ufb01cients, each possibly having a \ufb02ipped sign.\nHadamard matrices are de\ufb01ned by the two properties Hij = \u00b11 and HH(cid:62) = mI. This case can\nbe seen as the opposite extreme to the case of ICA. Instead of running an independent search for the\nmost Gaussian marginals, the central limit theorem is used to produce the most Gaussian compo-\nnents by using the Hadamard transformation to mix all ICA coef\ufb01cients with equal weight resorting\nto the independence assumption underlying ICA.\n\n2.2 Lp-spherically Symmetric Distributions\n\nThe contour lines of spherically symmetric distributions have constant Euclidean norm. Simi-\nlarly, the contour lines of Lp-spherically symmetric distributions have constant p-norm1 ||y||p :=\n1Note that ||y||p is only a norm in the strict sense if p \u2265 1. However, since the following considerations also\nhold for 0 < p < 1, we will employ the term \u201cp-norm\u201d and the notation \u201c||y||p\u201d for notational convenience.\n\n2\n\n\fp(cid:112)(cid:80)n\n\ni=1 |yi|p The set of vectors with constant p-norm Sn\u22121\n\n||y||p = r, p >\n0, r > 0} is called p-sphere of radius r. Different examples of p-spheres are shown along the\ncoordinate axis of Figure 1. For p (cid:54)= 2 the distribution is not invariant under arbitrary orthogonal\ntransformations, which means that the choice of the basis V can make a difference in the likelihood\nof the data.\n\n(r) := {y \u2208 Rn :\n\np\n\nFigure 1: The spherically symmetric distributions are a subset of the Lp-spherical symmetric distri-\nbutions. The right shapes indicate the iso-density lines for the different distributions. The Gaussian\nis the only L2-spherically symmetric distribution with independent marginals. Like the Gaussian\ndistribution, all p-generalized Normal distributions have independent marginals. ICA, SYM, ... de-\nnote the models used in the experiments below.\n\nA multivariate random variable Y is called Lp-spherically symmetric distributed if it can be written\nas a product Y = RU, where U is uniformly distributed on Sn\u22121\n(1) and R is a univariate non-\nnegative random variable with an arbitrary distribution [23; 12]. Intuitively, R corresponds to the\nradial component, i. e. the length ||y||p measured with the p-norm. U describes the directional com-\nponents in a polar-like coordinate system (see Extra Material). It can be shown that this de\ufb01nition\nis equivalent to the density \u0001(y) of Y having the form \u0001(y) = f(||y||p\np) [12]. This immediately\nsuggests two ways of constructing an Lp-spherically symmetric distribution. Most obviously, one\ncan specify a density \u0001(y) that has the form \u0001(y) = f(||y||p\np). An example is the p-generalized\nNormal distribution (gN) [11]\n\np\n\n(cid:18)\n\n\u2212\n\n(cid:19)\n\n(cid:80)n\ni=1 |yi|p\n2\u03c32\n\n= f(||y||p\np).\n\n(1)\n\n\u0001(y) =\n\n\u0393n\n\npn\n(2\u03c32) n\n\np 2n\n\nexp\n\n(cid:17)\n\n(cid:16) 1\n\np\n\nAnalogous to the Gaussian being the only factorial spherically symmetric distribution [1], this dis-\ntribution is the only Lp-spherically symmetric distribution with independent marginals [22]. For the\np-generalized Normal, the marginals are members of the exponential power family.\nIn our experiments, we will use the p-generalized Normal to model linear marginal independence by\n\ufb01tting it to the coef\ufb01cients of the various bases in whitened space. Since this distribution is sensitive\nto the particular \ufb01lter shapes for p (cid:54)= 2, we can assess how well the distribution of the linearly\ntransformed image patches is matched by a factorial model.\nAn alternative way of constructing an Lp-spherically symmetric distribution is to specify the radial\ndistribution \u0001r. One example, which will be used later, is obtained by choosing a mixture of Log-\nNormal distributions (RMixLogN). In Cartesian coordinates, this yields the density\n\u2212(log ||y||p \u2212 \u00b5k)2\n\n\u0001(y) =\n\npn\u22121\u0393\n\n(cid:19)\n\n(cid:18)\n\nexp\n\n(2)\n\n(cid:17)\n(cid:16) n\n(cid:17) K(cid:88)\n(cid:16) 1\n\np\n\n.\n\n\u03b7k\n||y||n\np \u03c3k\n\n\u221a\n2\u03c0\n\n2\u03c32\nk\n\n2n\u0393n\n\np\n\nk=1\n\n3\n\nFactorial DistributionsLp Spherically Symmetric DistributionsNormal DistributionpICAcICASYMcSYMHADcHADp-generalized Normal Distributionsp=2: Spherically Symmetric Distributions\fAn immediate consequence of any Lp-spherically symmetric distribution being speci\ufb01ed by its ra-\ndial density is the possibility to change between any two of those distributions by transforming the\nradial component with (F\u22121\n\u25e6 F1)(||y||p), where F1 and F2 are cumulative distribution functions\n(cdf) of the source and the target density, respectively. In particular, for a \ufb01xed p, any Lp-spherically\nsymmetric distribution can be transformed into a factorial one by the transform\n\n2\n\nz = g(y) \u00b7 y =\n\n(F\u22121\n\n2 \u25e6 F1)(||y||p)\n\n||y||p\n\ny.\n\nc+r with r = ||y||2\n\nThis transform closely resembles contrast gain control models for primary visual cortex [13; 20],\nwhich use a different gain function having the form \u02dcg(y) = 1\nWe will use the distribution of equation (2) to describe the joint model consisting of a linear \ufb01ltering\nstep followed by a contrast gain control mechanism. Once, the linear \ufb01lter responses in whitened\nspace are \ufb01tted with this distribution, we non-linearly transform it into a the factorial p-generalized\nNormal by the transformation g(y) \u00b7 y = (F\u22121\nFinally, note that because a Lp-spherically symmetric distribution is speci\ufb01ed by its univariate radial\ndistribution, \ufb01tting it to data boils down to estimating the univariate density for R, which can be done\nef\ufb01ciently and robustly.\n\ngN \u25e6 FRMixLogN)(||y||p)/||y||p \u00b7 y.\n\n2 [17].\n\n3 Experiments and Results\n\n3.1 Dataset\n\nWe use the dataset from the Bristol Hyperspectral Images Database [7], which was already used in\nprevious studies [25; 16]. All images had a resolution of 256\u00d7256 pixels and were converted to gray\nlevel by averaging over the channels. From each image circa 5000 patches of size 15\u00d715 pixels were\ndrawn at random locations for training (circa 40000 patches in total) as well as circa 6250 patches\nper image for testing (circa 50000 patches in total). In total, we sampled ten pairs of training and\ntest sets in that way. All results below are averaged over those. Before computing the linear \ufb01lters,\nthe DC component was projected out with an orthogonal transformation using a QR decomposition.\nAfterwards, the data was rescaled in order to make whitening a volume conserving transformation\n(a transformation with determinant one) since those transformations leave the entropy unchanged.\n\n3.2 Evaluation Measure\n\n(cid:80)m\nIn all our experiments, we used the Average Log Loss (ALL) to assess the quality of the \ufb01t and\nk=1 \u2212 log2 \u02c6\u0001(y) is\nthe redundancy reduction achieved. The ALL = 1\nn\nthe negative mean log-likelihood of the model distribution under the true distribution. If the model\ndistribution matches the true one, the ALL equals the entropy. Otherwise, the difference between\nthe ALL and the entropy of the true distribution is exactly the Kullback-Leiber divergence between\nthe two. The difference between the ALLs of two models equals the reduction in multi-information\n(see Extra Material) and can therefore be used to quantify the amount of redundancy reduction.\n\nE\u0001[\u2212 log2 \u02c6\u0001(y)] \u2248 1\n\nmn\n\n3.3 Experiments\n\nWe \ufb01tted the Lp-spherically symmetric distributions from equations (1) and (2) to the image patches\nin the bases HAD, SYM, and ICA by a maximum likelihood \ufb01t on the radial component. For the\nmixture of Log-Normal distributions, we used EM for a mixture of Gaussians on the logarithm of\nthe p-norm of the image patches.\nFor each model, we computed the maximum likelihood estimate of the model parameters and deter-\nmined the best value for p according to the ALL in bits per component on a training set. The \ufb01nal\nALL was computed on a separate test set.\nFor ICA, we performed a gradient descent over the orthogonal group on the log-likelihood of a\nproduct of independent exponential power distributions, where we used the result of the FastICA\nalgorithm by Hyv\u00a8arinen et al. as initial starting point [14]. All transforms were computed separately\nfor each training set.\n\n4\n\n\fFigure 2: ALL in bits per component as a function of p. The linewidth corresponds to the standard\ndeviation over ten pairs of training and test sets. Left: ALL for the bases HAD, SYM and ICA under\nthe p-generalized Normal (HAD, SYM, ICA) and the factorial Lp-spherically symmetric model with\nthe radial component modeled by a mixture of Log-Normal distributions (cHAD, cSYM, cICA).\nRight: Bar plot for the different ALL indicated by horizontal lines in the left plot.\n\nIn order to compare the redundancy reduction of the different transforms with respect to the pixel\nbasis (PIX), we computed a non-parametric estimate of the marginal entropies of the patches before\nthe DC component was projected out [6]. Since the estimation is not bound to a particular parametric\nmodel, we used the mean of the marginal entropies as an estimate of the average log-loss in the pixel\nrepresentation.\n\n3.4 Results\n\nFigure 2 and Table 1 show the ALL for the bases HAD, SYM, and ICA as a function of p. The\nupper curve bundle represents the factorial p-generalized Normal model, the lower bundle the non-\nfactorial model with the radial component modeled by a mixture of Log-Normal distributions with\n\ufb01ve mixtures. The ALL for the factorial models always exceeds the ALL for the non-factorial\nmodels. At p = 2, all curves intersect, because all models are invariant under a change of basis for\nthat value. Note that the smaller ALL of the non-factorial model cannot be attributed to the mixture\nof Log-Normal distributions having more degrees of freedom. As mentioned in the introduction, the\np-generalized Normal is the only factorial Lp-spherically symmetric distribution [22]. Therefore,\nmarginal independence is such a rigid assumption that the output scale is the only degree of freedom\nleft.\nFrom the left plot in Figure 2, we can assess the in\ufb02uence of the different \ufb01lter shapes and contrast\ngain control on the redundancy reduction of natural images. We used the best ALL of the HAD\nbasis under the p-generalized Normal as a baseline for a whitening transformation without contrast\ngain control (HAD). Analogously, we used the best ALL of the HAD basis under the non-factorial\nmodel as a baseline for a pure contrast gain control model (cHAD). We compared these values\nto the best ALL obtained by using the SYM and the ICA basis under both models. Because the\n\ufb01lters of SYM and ICA resemble receptive \ufb01eld properties of retinal ganglion cells and V1 simple\ncells, respectively, we can assess their possible in\ufb02uence on the redundancy reduction with and\nwithout contrast gain control. The factorial model corresponds to the case without contrast gain\ncontrol (SYM and ICA). Since we have shown that the non-factorial model can be transformed into\na factorial one by a p-norm based divisive normalization operation, these scores correspond to the\ncases with contrast gain control (cSYM and cICA). The different cases are depicted by the horizontal\nlines in Figure 2.\nAs already reported in other works, plain orientation selectivity adds only very little to the redun-\ndancy reduction achieved by decorrelation and is less effective than the baseline contrast gain con-\ntrol model [10; 6; 17]. If both orientation selectivity and contrast gain control are combined (cICA)\nit is possible to achieve about 9% extra redundancy reduction in addition to baseline whitening\n\n5\n\nHADSYMICAcHADcSYMcICA\fAbsolute Difference [Bits/Comp.]\nHAD - PIX \u22123.2947 \u00b1 0.0018\n\u22123.3638 \u00b1 0.0022\nSYM- PIX\n\u22123.4110 \u00b1 0.0024\nICA - PIX\ncHAD - PIX \u22123.5692 \u00b1 0.0045\ncSYM - PIX \u22123.5945 \u00b1 0.0047\ncICA - PIX \u22123.6205 \u00b1 0.0049\n\nRelative Difference [% wrt. cICA]\n\n91.0016 \u00b1 0.0832\n92.9087 \u00b1 0.0782\n94.2135 \u00b1 0.0747\n98.5839 \u00b1 0.0134\n99.2815 \u00b1 0.0098\n100.0000 \u00b1 0.0000\n\nTable 1: Difference in ALL for gray value images with standard deviation over ten training and test\nset pairs. The column on the left displays the absolute difference to the PIX representation. The\ncolumn on the right shows the relative difference with respect to the largest reduction achieved by\nICA with non-factorial model.\n\nFigure 3: The curve in the up-\nper right corner depicts the trans-\nformation ||z||p = (F\u22121\ngN \u25e6\nFRMixLogN)(||y||p) of\nthe radial\ncomponent in the ICA basis for\ngray scale images.\nThe result-\ning radial distribution over ||z||p\ncorresponds to the radial distribu-\ntion of the p-generalized Normal.\nThe inset shows the gain function\ng(||y||p) = FRMixLogN(||y||p)\nin log-\nlog coordinates. The scale parame-\nter of the p-generalized normal was\nchosen such that the marginal had\nunit variance.\n\n||y||p\n\n(HAD). By setting the other models in relation to the best joint model (cICA:= 100%), we are able\nto tell apart the relative contributions of bandpass \ufb01ltering (HAD= 91%), particular \ufb01lter shapes\n(SYM= 93%, ICA= 94%), contrast gain control (cHAD= 98.6%) as well as combined models\n(cSYM= 99%, cICA := 100%) to redundancy reduction (see Table 1). Thus, orientation selectivity\n(ICA) contributes less to the overall redundancy reduction than any model with contrast gain control\n(cHAD, cSYM, cICA). Additionally, the relative difference between the joint model (cICA) and\nplain contrast gain control (cHAD) is only about 1.4%. For cSYM it is even less, about 0.7%. The\ndifference in redundancy reduction between center-surround \ufb01lters and orientation selective \ufb01lters\nbecomes even smaller in combination with contrast gain control (1.3% for ICA vs. SYM, 0.7% for\ncICA vs. cSYM). However, it is still signi\ufb01cant (t-test, p = 5.5217 \u00b7 10\u22129).\nWhen examining the gain functions g(||y||p) = (F\u22121\nresulting from the transforma-\ntion of the radial components, we \ufb01nd that they approximately exhibit the form g(||y||p) = c||y||\u03ba\n.\nThe inset in Figure 3 shows the gain control function g(||y||p) in a log-log plot. While standard con-\ntrast gain control models assume p = 2 and \u03ba = 2, we \ufb01nd that \u03ba between 0.90 and 0.93 to be opti-\nmal for redundancy reduction. p depends on the shape of the linear \ufb01lters and ranges from approx-\nimately 1.2 to 2. In addition, existing contrast gain models assume the form g(||y||2) =\n,\n\u03c3+||y||2\n2\nwhile we \ufb01nd that \u03c3 must be approximately zero.\nIn the results above, the ICA \ufb01lters always achieve the lowest ALL under both p-spherically sym-\nmetric models. For examining whether these \ufb01lters really represent the best choice, we also opti-\nmized the \ufb01lter shapes under the model of equation (2) via maximum likelihood estimation on the\northogonal group in whitened space [9; 18]. Figure 4 shows the \ufb01lter shapes for ICA and the ones\nobtained from the optimization, where we used either the ICA solution or a random orthogonal ma-\ntrix as starting point. Qualitatively, the \ufb01lters look exactly the same. The ALL also changed just\n\ngN \u25e6FRMixLogN)(||y||p)\n\n||y||p\n\np\n\n1\n\n6\n\n  10\u2212110010110210310\u22121100101102  HADSYMICA\fFigure 4: Filters optimized for ICA (left) and for the p-spherically symmetric model with radial\nmixture of Log-Normal distributions starting from the ICA solution (middle) and from a random\nbasis (right). The \ufb01rst \ufb01lter corresponds to the DC component, the others to the \ufb01lter shapes under\nthe respective model. Qualitatively the \ufb01lter shapes are very similar. The ALL for the ICA basis\nunder the mixture of Log-Normal model is 1.6748\u00b1 0.0058 bits/component (left), the ALL with the\noptimized \ufb01lters is 1.6716 \u00b1 0.0056 (middle) and 1.6841 \u00b1 0.0068 (right).\nmarginally from 1.6748 \u00b1 0.0058 to 1.6716 \u00b1 0.0056 or 1.6841 \u00b1 0.0068, respectively. Thus, the\nICA \ufb01lters are a stable and optimal solution under the model with contrast gain control, too.\n\n4 Summary\n\nIn this report, we studied the conjoint effect of contrast gain control and orientation selectivity on\nredundancy reduction for natural images. In particular, we showed how the Lp-spherically distribu-\ntion can be used to tune a nonlinearity of contrast gain control to remove higher-order redundancies\nin natural images.\nThe idea of using an Lp-spherically symmetric model for natural images has already been brought\nup by Hyv\u00a8arinen and K\u00a8oster in the context of Independent Subspace Analysis [15]. However, they\ndo not use the Lp-distribution for contrast gain control, but apply a global contrast gain control \ufb01lter\non the images before \ufb01tting their model. They also use a less \ufb02exible Lp-distribution since their goal\nis to \ufb01t an ISA model to natural images and not to carry out a quantitative comparison as we did.\nIn our work, we \ufb01nd that the gain control function turns out to follow a power law, which parallels\nthe classical model of contrast gain control. In addition, we \ufb01nd that edge \ufb01lters also emerge in the\nnon-linear model which includes contrast gain control. The relevance of orientation selectivity for\nredundancy reduction, however, is further reduced. In the linear framework (possibly endowed with\na point-wise nonlinearity for each neuron) the contribution of orientation selectivity to redundancy\nreduction has been shown to be smaller than 5% relative to whitening (i. e. bandpass \ufb01ltering)\nalone [6; 10]. Here, we found that the contribution of orientation selectivity is even smaller than two\npercent relative to whitening plus gain control. Thus, this quantitative model comparison provides\nfurther evidence that orientation selectivity is not critical for redundancy reduction, while contrast\ngain control may play a more important role.\n\nAcknowledgements\n\nThe authors would like to thank Reshad Hosseini, Sebastian Gerwinn and Philipp Berens for fruitful discus-\nsions. This work is supported by the German Ministry of Education, Science, Research and Technology through\nthe Bernstein award to MB (BMBF; FKZ: 01GQ0601), a scholarship of the German National Academic Foun-\ndation to FS, and the Max Planck Society.\n\nReferences\n[1] S. F. Arnold and J. Lynch. On Ali\u2019s characterization of the spherical normal distribution. Journal of the\n\nRoyal Statistical Society. Series B (Methodological), 44(1):49\u201351, 1982.\n\n7\n\n\f[2] J. J. Atick. Could information theory provide an ecological theory of sensory processing? Network,\n\n3:213\u2013251, 1992.\n\n[3] F. Attneave. Informational aspects of visual perception. Psychological Review, 61:183\u2013193, 1954.\n[4] H. B. Barlow. Sensory mechanisms, the reduction of redundancy, and intelligence. In The Mechanisation\n\nof Thought Processes, pages 535\u2013539, London: Her Majesty\u2019s Stationery Of\ufb01ce, 1959.\n\n[5] A. J. Bell and T. J. Sejnowski. The \u201cindependent components\u201d of natural scenes are edge \ufb01lters. Vision\n\nRes., 37(23):3327\u201338, 1997.\n\n[6] M. Bethge. Factorial coding of natural images: How effective are linear model in removing higher-order\n\ndependencies? J. Opt. Soc. Am. A, 23(6):1253\u20131268, June 2006.\n\n[7] G. J. Brelstaff, A. Parraga, T. Troscianko, and D. Carr. Hyperspectral camera system: acquisition and anal-\nysis. In B. J. Lurie, J. J. Pearson, and E. Zilioli, editors, Proceedings of SPIE, volume 2587, pages 150\u2013\n159, 1995. The database can be downloaded from: http://psy223.psy.bris.ac.uk/hyper/.\n[8] G. Buchsbaum and A. Gottschalk. Trichromacy, opponent colours coding and optimum colour informa-\ntion transmission in the retina. Proceedings of the Royal Society of London. Series B, Biological Sciences,\n220:89\u2013113, November 1983.\n\n[9] A. Edelman, T. A. Arias, and S. T. Smith. The geometry of algorithms with orthogonality constraints.\n\nSIAM J. Matrix Anal. Appl., 20(2):303\u2013353, 1999.\n\n[10] J. Eichhorn, F. Sinz, and M. Bethge. Simple cell coding of natural images in V1: How much use is\n\norientation selectivity? (arxiv:0810.2872v1). 2008.\n\n[11] I. R. Goodman and S. Kotz. Mutltivariate \u03b8-generalized normal distributions. Journal of Multivariate\n\nAnalysis, 3:204\u2013219, 1973.\n\n[12] A. K. Gupta and D. Song. lp-norm spherical distribution. Journal of Statistical Planning and Inference,\n\n60:241\u2013260, 1997.\n\n[13] D. J. Heeger. Normalization of cell responses in cat striate cortex. Visual Neuroscience, 9:181\u2013198, 1992.\n[14] A. Hyv\u00a8arinen, J. Karhunen, and E. Oja. Independent Component Analysis. John Wiley & Sons, 2001.\n[15] A. Hyv\u00a8arinen and U. K\u00a8oster. Complex cell pooling and the statistics of natural images. Network, 18:81\u2013\n\n100, 2007.\n\n[16] T.-W. Lee, T. Wachtler, and T. J. Sejnowski. Color opponency is an ef\ufb01cient representation of spectral\n\nproperties in natural scenes. Vision Res, 42(17):2095\u20132103, Aug 2002.\n\n[17] S. Lyu and E. P. Simoncelli. Nonlinear extraction of \u2019independent components\u2019 of elliptically symmet-\nric densities using radial Gaussianization. Technical Report TR2008-911, Computer Science Technical\nReport, Courant Inst. of Mathematical Sciences, New York University, April 2008.\n\n[18] J. H. Manton. Optimization algorithms exploiting unitary constraints.\n\nProcessing, 50:635 \u2013 650, 2002.\n\nIEEE Transactions on Signal\n\n[19] B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive \ufb01eld properties by learning a sparse\n\ncode for natural images. Nature, 381:607\u2013609, June 1996.\n\n[20] O. Schwartz and E. P. Simoncelli. Natural signal statistics and sensory gain control. Nature Neuroscience,\n\n4(8):819\u2013825, August 2001.\n\n[21] E. P. Simoncelli and O. Schwartz. Modeling surround suppression in V1 neurons with a statistically-\nderived normalization model. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Adv. Neural Infor-\nmation Processing Systems (NIPS*98), volume 11, pages 153\u2013159, Cambridge, MA, 1999. MIT Press.\n\n[22] F. H. Sinz, S. Gerwinn, and M. Bethge. Characterization of the p-generalized normal distribution. Journal\n\nof Multivariate Analysis, 07/26/ 2008.\n\n[23] D. Song and A. K. Gupta.\nSociety, 125:595\u2013601, 1997.\n\nlp-norm uniform distribution. Proceedings of the American Mathematical\n\n[24] J. H. van Hateren and A. van der Schaaf. Independent component \ufb01lters of natural images compared with\n\nsimple cells in primary visual cortex. Proc R Soc Lond B Biol Sci., 265(1394):1724\u20131726, 1998.\n\n[25] T Wachtler, T W Lee, and T J Sejnowski. Chromatic structure of natural scenes. Journal of the Optical\n\nSociety of America. A, Optics, image science, and vision, 18:65\u201377, 2001. PMID: 11152005.\n\n8\n\n\f", "award": [], "sourceid": 635, "authors": [{"given_name": "Fabian", "family_name": "Sinz", "institution": null}, {"given_name": "Matthias", "family_name": "Bethge", "institution": null}]}