{"title": "Color Opponency Constitutes a Sparse Representation for the Chromatic Structure of Natural Scenes", "book": "Advances in Neural Information Processing Systems", "page_first": 866, "page_last": 872, "abstract": null, "full_text": "Color Opponency Constitutes A Sparse \n\nRepresentation For the Chromatic \n\nStructure of Natural Scenes \n\nTe-Won Lee; Thomas Wachtler and Terrence Sejnowski \n\nInstitute for Neural Computation, University of California, San Diego & \n\nComputational Neurobiology Laboratory, The Salk Institute \n\n10010 N. Torrey Pines Road \n\nLa Jolla, California 92037, USA \n\n{tewon,thomas,terry}~salk.edu \n\nAbstract \n\nThe human visual system encodes the chromatic signals conveyed \nby the three types of retinal cone photoreceptors in an opponent \nfashion. This color opponency has been shown to constitute an \nefficient encoding by spectral decorrelation of the receptor signals. \nWe analyze the spatial and chromatic structure of natural scenes by \ndecomposing the spectral images into a set of linear basis functions \nsuch that they constitute a representation with minimal redun(cid:173)\ndancy. Independent component analysis finds the basis functions \nthat transforms the spatiochromatic data such that the outputs \n(activations) are statistically as independent as possible, i.e. least \nredundant. The resulting basis functions show strong opponency \nalong an achromatic direction (luminance edges), along a blue(cid:173)\nyellow direction, and along a red-blue direction. Furthermore, the \nresulting activations have very sparse distributions, suggesting that \nthe use of color opponency in the human visual system achieves a \nhighly efficient representation of colors. Our findings suggest that \ncolor opponency is a result of the properties of natural spectra and \nnot solely a consequence of the overlapping cone spectral sensitiv(cid:173)\nities. \n\n1 Statistical structure of natural scenes \n\nEfficient encoding of visual sensory information is an important task for informa(cid:173)\ntion processing systems and its study may provide insights into coding principles \nof biological visual systems. An important goal of sensory information processing \n\nElectronic version available at www. cnl. salk . edu/ \"\"tewon. \n\n\fis to transform the input signals such that the redundancy between the inputs is \nreduced. In natural scenes, the image intensity is highly predictable from neighbor(cid:173)\ning measurements and an efficient representation preserves the information while \nthe neuronal output is minimized. Recently, several methods have been proposed \nfor finding efficient codes for achromatic images of natural scenes [1, 2, 3, 4]. While \nluminance dominates the structure of the visual world, color vision provides impor(cid:173)\ntant additional information about our environment. Therefore, we are interested \nin efficient, i.e. redundancy reducing representations for the chromatic structure of \nnatural scenes. \n\n2 Learning efficient representation for chromatic image \n\nOur goal was to find efficient representations of the chromatic sensory information \nsuch that its spatial and chromatic redundancy is reduced significantly. The method \nwe used for finding statistically efficient representations is independent component \nanalysis (ICA). ICA is a way of finding a linear non-orthogonal co-ordinate system \nin multivariate data that minimizes mutual information among the axial projections \nof the data. The directions of the axes of this co-ordinate system (basis functions) \nare determined by both second and higher-order statistics of the original data, com(cid:173)\npared to Principal Component Analysis (PCA) which is used solely in second order \nstatistics and has orthogonal basis functions. The goal of ICA is to perform a \nlinear transform which makes the resulting source outputs as statistically indepen(cid:173)\ndent from each other as possible [5]. ICA assumes an unknown source vector s \nwith mutually independent components Si. A small patch of the observed image is \nstretched into a vector x that can be represented as a linear combination of sources \ncomponents Si such that \n\nx=As, \n\n(1) \n\nwhere A is a scalar square matrix and the columns of A are the basis functions. \nSince A and s are unknown the goal of ICA is to adapt the basis functions by esti(cid:173)\nmating s so that the individual components Si are statistically independent and this \nadaptation process minimizes the mutual information between the components Si. \nA learning algorithm can be derived using the information maximization principle \n[5] or the maximum likelihood estimation (MLE) method which can be shown to be \nequivalent in this case. In our experiments, we used the infomax learning rule with \nnatural gradient extension and the learning algorithm for the basis functions is \n\n(2) \nwhere I is the identity matrix, rp(s) = - 8p~(W3s and sT denotes the matrix trans(cid:173)\npose of s . .6.A is the change of the basis functions that is added to A. The change \nin .6.A will converge to zero once the adaptation process is complete. Note that \nrp(s) requires a density model for p(Si). We used a parametric exponential power \ndensity P(Si) ex exp( -ISilqi) and simultaneously updated its shape by inferring the \nvalue qi to match the distribution of the estimated sources [6]. This is accomplished \nby finding the maximum posteriori value of qi given the observed data. The ICA \nalgorithm can thus characterize a wide class of statistical distributions including \nuniform, Gaussian, Laplacian, and other so-called sub- and super-Gaussian densi(cid:173)\nties. In other words, our experiments do not constrain the coefficients to have a \n\n\fa) \n\nb) \n\n700 \n\n< [n m[ \n\nFigure 1: Linear decomposition of an observed spectral image patch into its basis \nfunctions. \n\nsparse distribution, unlike some previous methods [1, 2]. The algorithm converged \nto a solution of maximal independence and the distributions of the coefficients were \napproximated by exponential power densities. \n\nWe investigated samples of spectral images of natural scenes as illustrated in Fig(cid:173)\nure 1. We analyzed a set of hyperspectral images [7] with a size of 256 x 256 pixels. \nEach pixel is represented by radiance values for 31 wavebands of 10 nm width, \nsampled in 10 nm steps between 400 and 700 nm. The pixel size corresponds to \n0.056xO.056 deg of visual angle. The images were recorded around Bristol, either \noutdoors, or inside the glass houses of Bristol Botanical Gardens. We chose eight \nof these images which had been obtained outdoors under apparently different illu(cid:173)\nmination conditions. The vector of 31 spectral radiance values of each pixel was \nconverted to a vector of 3 cone excitation values whose components were the inner \nproducts of the radiance vector with the vectors of L-, M-, and S-cone sensitivity \nvalues [8], respectively. From the entire image data set, 7x7 pixel image patches \nwere chosen randomly, yielding 7x7x3 = 147 dimensional vectors. The learning \nprocess was done in 500 steps, each using a set of spectra of 40000 image patches, \n5000 chosen randomly from each of the eight images. A set of basis functions for \n7x7 pixel patches was obtained, with each pixel containing the logarithms of the \nexcitations of the three human cone photo receptors that represented the receptor \nsignals in the human retina [8, 9]. To visualize the learned basis functions, we \n\n\fused the method by Ruderman et al. [9] and plotted for each basis function a 7 x 7 \npixel matrix, with the color of each pixel indicating the combination of L, M, and \nS cone responses as follows. The values for each patch were normalized to values \nbetween a and 255, with a cone excitation corresponding to a value of 128. Thus, \nthe R, G, and B components of each pixel represent the relative excitations of L, \nM, and S cones, respectively. To further illustrate the chromatic properties of the \nbasis functions, we convert the L, M, S vector of each pixel to its projection onto \nthe isoluminant plane of a cone-opponent color space similar to the color spaces of \nMacLeod and Boynton[lO] and Derrington et al[l1]. In our plots, the horizontal \naxis corresponds to the response of an L cone versus M cone opponent mechanism, \nthe vertical axis corresponds to S cone modulation. For each pixel of the basis \nfunctions, a point is plotted at its corresponding location in that color space. The \ncolor of the points are the same as used for the pixels in the top part of the fig(cid:173)\nure. Thus, although only the projection onto the isoluminant plane is shown, the \nthird dimension (i.e., luminance) can be inferred by the brightness of the points. \nFigure 2a shows the learned leA basis functions in a pseudo color representation. \nFigure 2b shows the color space coordinates of the chromaticities of the pixels in \neach basis function. The peA basis functions and their corresponding color space \ncoordinates are shown in Figure 2c and 2d respectively. Both representations are \nin order of decreasing L 2-norm. The peA results show a global spatial represen(cid:173)\ntation and their opponent basis functions lie mostly along the coordinate axes of \nthe cone-opponent color space. In addition, there are functions that imply mixtures \nof non-opponent colors. In contrast to peA basis functions, the leA basis func(cid:173)\ntions are localized and oriented. When ordered by decreasing L2-norm, achromatic \nbasis functions tend to appear before chromatic basis functions. This reflects the \nfact that in the natural environment, luminance variations are generally larger than \nchromatic variations [7]. The achromatic basis functions are localized and oriented, \nsimilar to those found in the analysis of grayscale natural images [1, 2]. Most ofthe \nchromatic basis functions, particularly those with strong contributions, are color \nopponent, i.e., the chromaticities of their pixels lie roughly along a line through the \norigin of our color space. Most chromatic basis functions with relatively high con(cid:173)\ntributions are modulated between light blue and dark yellow, in the plane defined \nby luminance and S-cone modulation. Those with lower L 2-norm are highly local(cid:173)\nized, but still are mostly oriented. There are other chromatic basis functions with \ntilted orientations, corresponding to blue versus orange colors. The chromaticities \nof these basis functions occupy mainly the second and fourth quadrant. The basis \nfunctions with lowest contributions are less strictly aligned in color space, but still \ntend to be color opponent, mostly along a bluish-green/orange direction. There are \nno basis functions with chromaticities along the horizontal axis, corresponding to \npure L versus M cone opponency, like peA basis functions in Figure 2d [9]. The \ntilted orientations of the opponency axes most likely reflects the distribution of the \nchromaticities in our images. In natural images, L-M and S coordinates in our \ncolor space are negatively correlated [12]. leA finds the directions that correspond \nto maximally decorrelated signals, i.e. extracts statistical structure of the inputs. \npeA did not yield basis functions in these directions, probably because it is limited \nby the orthogonality constraint. While it is known that chromatic properties of \nneurons in the lateral geniculate nucleus (LGN) of primates correspond to varia(cid:173)\ntions along the axes of cone-opponency ('cardinal axes') [11], cortical neurons show \nsensitivities for intermediate directions [13]. Since the results of peA and leA, \n\n\frespectively, match these differences qualitatively, we suspect that opponent coding \nalong the 'cardinal directions' of cone opponency is used by the visual system to \ntransmit reliably visual information to the cortex, where the information is recoded \nin order to better reflect the statistical structure of the environment [14]. \n\n3 Discussion \n\nThis result shows that the independence criterion alone is sufficient to learn efficient \nimage codes. Although no sparseness constraint was used, the obtained coefficients \nare extremely sparse, i.e. the data x are encoded in the sources s in such a way \nthat the coefficients of s are mostly around zero; there is only a small percentage \nof informative values (non-zero coefficients). From an information coding perspec(cid:173)\ntive this assumes that we can encode and decode the chromatic image patches with \nonly a small percentage of the basis functions. In contrast, Gaussian densities are \nnot sparsely distributed and a large portion of the basis functions is required to \nrepresent the chromatic images. The normalized kurtosis value is one measure of \nsparseness and the average kurtosis value was 19.7 for leA, and 6.6 for peA. In(cid:173)\nterestingly the basis functions in Figure2a produced only sparse coefficients except \nfor basis function 7 (green basis function) that resulted in a nearly uniform dis(cid:173)\ntribution, suggesting that this basis function is active almost all the time. The \nreason may be that a green color component is present in almost all image patches \nof the natural scenes. We repeated the experiment with different leA methods and \nobtained similar results. The basis functions obtained with the exponential power \ndistributions or the simple Laplacian prior were statistically most efficient. In this \nsense, the basis functions that produce sparse distributions are statistically efficient \ncodes. To quantitatively measure the encoding difference we compared the coding \nefficiency between leA and peA using Shannon's theorem to obtain a lower bound \non the number of bits required to encode a spatiochromatic pattern [4]. The aver(cid:173)\nage number of bits required to encode 40000 patches randomly selected from the 8 \nimages in Figure 1 with a fixed noise coding precision of O'x = 0.059 was 1.73 bits \nfor leA and 4.46 bits for peA. Note that the encoding difference for achromatic \nimage patches using leA and peA is about 20% in favor of leA [4]. The encod(cid:173)\ning difference in the chromatic case is significantly higher (> 100%) and suggests \nthat there is a large amount of chromatic redundancy in the natural scenes. To \nverify our findings, we computed the average pairwise mutual information f in the \noriginal data (Ix = 0.1522), the peA representation (IPCA = 0.0123) and the leA \nrepresentation (fICA = 0.0093). leA was able to further reduce the redundancy \nbetween its components, and its basis functions therefore represent more efficient \ncodes. \n\nIn general, the leA results support the argument that basis functions for efficient \ncoding of chromatic natural images are non-orthogonal. In order to determine \nwhether the color opponency is merely a result of correlation in the receptor sig(cid:173)\nnals due to the strong overlap of the photoreceptor sensitivities [15], we repeated \nthe analysis, this time assuming hypothetical receptor sensitivities which do not \noverlap, but sample roughly in the same regions as the L-, M-, and S- cones. We \nused rectangular sensitivities with absorptions between 420 and 480 nm (\"S\"), 490 \nand 550 nm (\"M\"), and 560 and 620 nm (\"L\"), respectively. The resulting basis \nfunctions were as strongly color opponent as for the case of overlapping cone sensi(cid:173)\ntivities. This suggests that the correlations of radiance values in natural spectra are \n\n\f(b) \n\n(d) \n\nt-\n\n' \n\nEBEBEEEBEBEBEEEBEEEB \nEBEBEEEBEEEBEBEBEBEE \nEErnrnrnEEBJOJOJEEEB \nEBrnrnrnrnrnEEBJEEEB \nEBEBEB+ \n~ EElEElrnEEl \nEBEBEB I ~ \n~ EElEEEElEB \ntIJrnEEEE \n~ EEEElEflEEl \n~B1EElEEl ~ ~ EEEEEEEljEE \n~,~ -f. EEEBEEEE~EEEElEB \n,~ EElEBrn \n'~ EBa1EB \nBj~tB \n~8j~ \nffi~~ \n\nEE ~I~ \nEEl \n~ \n5j \nrn \n~ \u00b7m-+-\nEElEEEflEElEBEEEBEElEBEB \nEEEBEBEEEBEEEBEEEEEE \nEBEEEBEBEBEEEBEEEEEE \nEEEBEBEBrnEEEBEEEEEB \nEEEBEEEErnEBEEEBEEEE \nEBEEEEEBEBEEEEEEEBEB \n\u2022 EEEBt13 \n!~ EE ~ \nEEJEEEE \nEE \nEB ~ \ntE~~ \nffitijtljEE \nEElrnEEl \nEB~tIl~~EEEEm~B:l \nI m \n~mB:l \nI rnrnrn \nrn \nB:l \nI ~rnB:l \n' \n83 \n\n~838:jB:l \n\nFigure 2: (a) 147 total lCA spatiochromatic structure of basis functions (7 by 7 \npixels and 3 colors) are shown in order of decreasing L 2-norm, from top to bottom \nand left to right. The R, G, and B values of the color of each pixel correspond to \nthe relative excitation of L-, M-, and S-cones, respectively. (b) Chromaticities of \nthe lCA basis functions, plotted in cone-opponent color space coordinates. Each \ndot represents the coordinate of a pixel of the respective basis function, projected \nonto the isoluminant plane. Luminance can be inferred from the brightness of the \ndot. Horizontal axes: L- versus M-cone variation. Vertical axes: S-cone varia(cid:173)\ntion. (c) 147 PCA spatiochromatic basis functions and (d) Corresponding PCA \nchromaticities. \n\n\fsufficiently high to require a color opponent code in order to represent the chromatic \nstructure efficiently. In summary, our findings strongly suggest color opponency is \nnot a mere consequence of the overlapping cone spectral sensitivities but moreover \nan attempt to represent the intrinsic spatiochromatic structure of natural scenes in \na statistically efficient manner. \n\nReferences \n\n[1] B. Olshausen and D. Field. Emergence of simple-cell receptive field properties by \n\nlearning a sparse code for natural images. Nature, 381:607- 609, 1996. \n\n[2] A. J. Bell and T. J. Sejnowski. The 'independent components' of natural scenes are \n\nedge filters. Vision Research, 37(23):3327- 3338, 1997. \n\n[3] J. H. van Hateren and A. van der Schaaf. Independent component filters of natural \nimages compared with simple cells in primary visual cortex. Proc.R.Soc.Lond. B, \n265:359- 366, 1998. \n\n[4] M.S. Lewicki and B. Olshausen. A probablistic framwork for the adaptation and \ncomparison of image codes. J. Opt.Soc., A : Optics, Image Science and Vision, in \npress, 1999. \n\n[5] A. J. Bell and T. J. Sejnowski. An Information-Maximization Approach to Blind \n\nSeparation and Blind Deconvolution. Neural Computation, 7:1129-1159, 1995. \n\n[6] M.S. Lewicki. A flexible prior for independent component analysis. Neural Compu(cid:173)\n\ntation, submitted, 2000. \n\n[7] C. A. par-raga, G. Brelstaff, and T . Troscianko. Color and luminance information \nin natural scenes. Journal of the Optical Society of America A , 15:563- 569, 1998. \n(http://www.crs4.it/ ...... gjb/ftpJOSA.html). \n\n[8] A. Stockman, D. I. A. MacLeod, and N. E. Johnson. Spectral sensitivities of the \nhuman cones. Journal of the Optical Society of America A, 10:2491- 2521, 1993. \n(http://www-cvrl.ucsd.edu). \n\n[9] D. L. Ruderman, T. W. Cronin, and C.-C. Chiao. Statistics of cone responses to \nnatural images: Implications for visual coding. Journal of the Optical Society of \nAmerica A , 15:2036- 2045, 1998. \n\n[10] D. I. A. MacLeod and R. M. Boynton. Chromaticity diagram showing cone excitation \nby stimuli of equal luminance. Journal of the Optical Society of America, 69:1183-\n1186, 1979. \n\n[11] A. M. Derrington, J. Krauskopf, and P. Lennie. Chromatic mechanisms in lateral \n\ngeniculate nucleus of macaque. Journal of Physiology, 357:241- 265 , 1984. \n\n[12] D. I. A. MacLeod and T. von der Twer. The pleistochrome: Optimal opponent codes \n\nfor natural colors. Preprint, 1998. \n\n[13] P. Lennie, J. Krauskopf, and G. Sclar. Chromatic mechanisms in striate cortex of \n\nmacaque. Journal of Neuroscience, 10:649- 669, 1990. \n\n[14] D. J. Field. What is the goal of sensory coding? Neural Computation, 6:559- 601 , \n\n1994. \n\n[15] G. Buchsbaum and A. Gottschalk. Trichromacy, opponent colours coding and opti(cid:173)\n\nmum colour information transmission in the retina. Proceedings of the Royal Society \nLondon B, 220:89- 113, 1983. \n\n\f", "award": [], "sourceid": 1909, "authors": [{"given_name": "Te-Won", "family_name": "Lee", "institution": null}, {"given_name": "Thomas", "family_name": "Wachtler", "institution": null}, {"given_name": "Terrence", "family_name": "Sejnowski", "institution": null}]}