{"title": "Color Constancy by Learning to Predict Chromaticity from Luminance", "book": "Advances in Neural Information Processing Systems", "page_first": 163, "page_last": 171, "abstract": "Color constancy is the recovery of true surface color from observed color, and requires estimating the chromaticity of scene illumination to correct for the bias it induces. In this paper, we show that the per-pixel color statistics of natural scenes---without any spatial or semantic context---can by themselves be a powerful cue for color constancy. Specifically, we describe an illuminant estimation method that is built around a classifier for identifying the true chromaticity of a pixel given its luminance (absolute brightness across color channels). During inference, each pixel's observed color restricts its true chromaticity to those values that can be explained by one of a candidate set of illuminants, and applying the classifier over these values yields a distribution over the corresponding illuminants. A global estimate for the scene illuminant is computed through a simple aggregation of these distributions across all pixels. We begin by simply defining the luminance-to-chromaticity classifier by computing empirical histograms over discretized chromaticity and luminance values from a training set of natural images. These histograms reflect a preference for hues corresponding to smooth reflectance functions, and for achromatic colors in brighter pixels. Despite its simplicity, the resulting estimation algorithm outperforms current state-of-the-art color constancy methods. Next, we propose a method to learn the luminance-to-chromaticity classifier end-to-end. Using stochastic gradient descent, we set chromaticity-luminance likelihoods to minimize errors in the final scene illuminant estimates on a training set. This leads to further improvements in accuracy, most significantly in the tail of the error distribution.", "full_text": "Color Constancy by Learning to Predict\n\nChromaticity from Luminance\n\nAyan Chakrabarti\n\nToyota Technological Institute at Chicago\n6045 S. Kenwood Ave., Chicago, IL 60637\n\nayanc@ttic.edu\n\nAbstract\n\nColor constancy is the recovery of true surface color from observed color, and\nrequires estimating the chromaticity of scene illumination to correct for the bias\nit induces.\nIn this paper, we show that the per-pixel color statistics of natural\nscenes\u2014without any spatial or semantic context\u2014can by themselves be a pow-\nerful cue for color constancy. Speci\ufb01cally, we describe an illuminant estimation\nmethod that is built around a \u201cclassi\ufb01er\u201d for identifying the true chromaticity of\na pixel given its luminance (absolute brightness across color channels). During\ninference, each pixel\u2019s observed color restricts its true chromaticity to those val-\nues that can be explained by one of a candidate set of illuminants, and applying\nthe classi\ufb01er over these values yields a distribution over the corresponding illumi-\nnants. A global estimate for the scene illuminant is computed through a simple\naggregation of these distributions across all pixels. We begin by simply de\ufb01n-\ning the luminance-to-chromaticity classi\ufb01er by computing empirical histograms\nover discretized chromaticity and luminance values from a training set of natural\nimages. These histograms re\ufb02ect a preference for hues corresponding to smooth\nre\ufb02ectance functions, and for achromatic colors in brighter pixels. Despite its\nsimplicity, the resulting estimation algorithm outperforms current state-of-the-art\ncolor constancy methods. Next, we propose a method to learn the luminance-\nto-chromaticity classi\ufb01er \u201cend-to-end\u201d. Using stochastic gradient descent, we set\nchromaticity-luminance likelihoods to minimize errors in the \ufb01nal scene illumi-\nnant estimates on a training set. This leads to further improvements in accuracy,\nmost signi\ufb01cantly in the tail of the error distribution.\n\n1\n\nIntroduction\n\nThe spectral distribution of light re\ufb02ected off a surface is a function of an intrinsic material property\nof the surface\u2014its re\ufb02ectance\u2014and also of the spectral distribution of the light illuminating the\nsurface. Consequently, the observed color of the same surface under different illuminants in different\nimages will be different. To be able to reliably use color computationally for identifying materials\nand objects, researchers are interested in deriving an encoding of color from an observed image that\nis invariant to changing illumination. This task is known as color constancy, and requires resolving\nthe ambiguity between illuminant and surface colors in an observed image. Since both of these\nquantities are unknown, much of color constancy research is focused on identifying models and\nstatistical properties of natural scenes that are informative for color constancy. While pschophysical\nexperiments have demonstrated that the human visual system is remarkably successful at achieving\ncolor constancy [1], it remains a challenging task computationally.\n\nEarly color constancy algorithms were based on relatively simple models for pixel colors. For\nexample, the gray world method [2] simply assumed that the average true intensities of different\ncolor channels across all pixels in an image would be equal, while the white-patch retinex method [3]\n\n1\n\n\fassumed that the true color of the brightest pixels in an image is white. Most modern color constancy\nmethods, however, are based on more complex reasoning with higher-order image features. Many\nmethods [4, 5, 6] use models for image derivatives instead of individual pixels. Others are based on\nrecognizing and matching image segments to those in a training set to recover true color [7]. A recent\nmethod proposes the use of a multi-layer convolutional neural network (CNN) to regress from image\npatches to illuminant color. There are also many \u201ccombination-based\u201d color constancy algorithms,\nthat combine illuminant estimates from a number of simpler \u201cunitary\u201d algorithms [8, 9, 10, 11],\nsometimes using image features to give higher weight to the outputs of some subset of methods.\n\nIn this paper, we demonstrate that by appropriately modeling and reasoning with the statistics of\nindividual pixel colors, one can computationally recover illuminant color with high accuracy. We\nconsider individual pixels in isolation, where the color constancy task reduces to discriminating be-\ntween the possible choices of true color for the pixel that are feasible given the observed color and a\ncandidate set of illuminants. Central to our method is a function that gives us the relative likelihoods\nof these true colors, and therefore a distribution over the corresponding candidate illuminants. Our\nglobal estimate for the scene illuminant is then computed by simply aggregating these distributions\nacross all pixels in the image.\n\nWe formulate the likelihood function as one that measures the conditional likelihood of true pixel\nchromaticity given observed luminance, in part to be agnostic to the scalar (i.e., color channel-\nindependent) ambiguity in observed color intensities. Moreover, rather than committing to a para-\nmetric form, we quantize the space of possible chromaticity and luminance values, and de\ufb01ne the\nfunction over this discrete domain. We begin by setting the conditional likelihoods purely empir-\nically, based simply on the histograms of true color values over all pixels in all images across a\ntraining set. Even with this purely empirical approach, our estimation algorithm yields estimates\nwith higher accuracy than current state-of-the-art methods. Then, we investigate learning the per-\npixel belief function by optimizing an objective based on the accuracy of the \ufb01nal global illuminant\nestimate. We carry out this optimization using stochastic gradient descent, and using a sub-sampling\napproach (similar to \u201cdropout\u201d [12]) to improve generalization beyond the training set. This further\nimproves estimation accuracy, without adding to the computational cost of inference.\n\n2 Preliminaries\n\nAssuming Lambertian re\ufb02ection, the spectral distribution of light re\ufb02ected by a material is a product\nof the distribution of the incident light and the material\u2019s re\ufb02ectance function. The color intensity\nvector v(n) \u2208 R3 recorded by a tri-chromatic sensor at each pixel n is then given by\n\nv(n) =Z \u03ba(n, \u03bb)\u2113(n, \u03bb) s(n) \u03a0(\u03bb) d\u03bb,\n\n(1)\n\nwhere \u03ba(n, \u03bb) is the re\ufb02ectance at n, \u2113(n, \u03bb) is the spectral distribution of the incident illumination,\ns(n) is a geometry-dependent shading factor, and \u03a0(\u03bb) \u2208 R3 denotes the spectral sensitivities of\nthe color sensors. Color constancy is typically framed as the task of computing from v(n) the\ncorresponding color intensities x(n) \u2208 R3 that would have been observed under some canonical\nilluminant \u2113ref (typically chosen to be \u2113ref(\u03bb) = 1). We will refer to x(n) as the \u201ctrue color\u201d at n.\n\nSince (1) involves a projection of the full incident light spectrum on to the three \ufb01lters \u03a0(\u03bb), it is not\ngenerally possible to recover x(n) from v(n) even with knowledge of the illuminant \u2113(n, \u03bb). How-\never, a commonly adopted approximation (shown to be reasonable under certain assumptions [13])\nis to relate the true and observed colors x(n) and v(n) by a simple per-channel adaptation:\n\nv(n) = m(n) \u25e6 x(n),\n\n(2)\n\nwhere \u25e6 refers to the element-wise Hadamard product, and m(n) \u2208 R3 depends on the illuminant\n\u2113(n, \u03bb) (for \u2113ref, m = [1, 1, 1]T ). With some abuse of terminology, we will refer to m(n) as the\nilluminant in the remainder of the paper. Moreover, we will focus on the single-illuminant case in\nthis paper, and assume m(n) = m, \u2200n in an image. Our goal during inference will be to estimate\nthis global illuminant m from the observed image v(n). The true color image x(n) can then simply\nbe recovered as m\u22121 \u25e6 v(n), where m\u22121 \u2208 R3 denotes the element-wise inverse of m.\n\nNote that color constancy algorithms seek to resolve the ambiguity between m and x(n) in (2) only\nup to a channel-independent scalar factor. This is because scalar ambiguities show up in m between\n\n2\n\n\f\u2113 and \u2113ref due to light attenuation, between x(n) and \u03ba(n) due to the shading factor s(n), and in\nthe observed image v(n) itself due to varying exposure settings. Therefore, the performance metric\n\ntypically used is the angular error cos\u22121(cid:16) m\n\nvectors m and \u00afm.\n\nT \u00afm\n\nkmk2k \u00afmk2(cid:17) between the true and estimated illuminant\n\nDatabase For training and evaluation, we use the database of 568 natural indoor and outdoor\nimages captured under various illuminants by Gehler et al. [14]. We use the version from Shi and\nFunt [15] that contains linear images (without gamma correction) generated from the RAW camera\ndata. The database contains images captured with two different cameras (86 images with a Canon\n1D, and 482 with a Canon 5D). Each image contains a color checker chart placed in the image, with\nits position manually labeled. The colors of the gray squares in the chart are taken to be the value\nof the true illuminant m for each image, which can then be used to correct the image to get true\ncolors at each pixel (of course, only up to scale). The chart is masked out during evaluation. We use\nk-fold cross-validation over this dataset in our experiments. Each fold contains images from both\ncameras corresponding to one of k roughly-equal partitions of each camera\u2019s image set (ordered\nby \ufb01le name/order of capture). Estimates for images in each fold are based on training only with\ndata from the remaining folds. We report results with three- and ten-fold cross-validation. These\ncorrespond to average training set sizes of 379 and 511 images respectively.\n\n3 Color Constancy with Pixel-wise Chromaticity Statistics\n\nA color vector x \u2208 R3 can be characterized in terms of (1) its luminance kxk1, or absolute brightness\nacross color channels; and (2) its chromaticity, which is a measure of the relative ratios between\nintensities in different channels. While there are different ways of encoding chromaticity, we will\ndo so in terms of the unit vector \u02c6x = x/kxk2 in the direction of x. Note that since intensities can\nnot be negative, \u02c6x is restricted to lie on the non-negative eighth of the unit sphere S2\n+. Remember\nfrom Sec. 2 that our goal is to resolve the ambiguity between the true colors x(n) and the illuminant\nm only up to scale. In other words, we need only estimate the illuminant chromaticity \u02c6m and true\nchromaticities \u02c6x(n) from the observed image v(n), which we can relate from (2) as\n\n\u02c6x(n) =\n\nx(n)\n\nkx(n)k2\n\n=\n\n\u02c6m\u22121 \u25e6 v(n)\n\nk \u02c6m\u22121 \u25e6 v(n)k2\n\n\u2206= g(v(n), \u02c6m).\n\n(3)\n\nA key property of natural illuminant chromaticities is that they are known to take a fairly restricted\nset of values, close to a one-dimensional locus predicted by Planck\u2019s radiation law [16]. To be able\nto exploit this, we denote M = { \u02c6mi}M\ni=1 as the set of possible values for illuminant chromaticity \u02c6m,\nand construct it from a training set. Speci\ufb01cally, we quantize1 the chromaticity vectors { \u02c6mt}T\nt=1 of\nthe illuminants in the training set, and let M be the set of unique chromaticity values. Additionally,\nwe de\ufb01ne a \u201cprior\u201d bi = log(ni/T ) over this candidate set, based on the number ni of training\nilluminants that were quantized to \u02c6mi.\n\nGiven the observed color v(n) at a single pixel n, the ambiguity in \u02c6m across the illuminant set M\ntranslates to a corresponding ambiguity in the true chromaticity \u02c6x(n) over the set {g(v(n), \u02c6mi)}i.\nFigure 1(a) illustrates this ambiguity for a few different observed colors v. We note that while there\nis signi\ufb01cant angular deviation within the set of possible true chromaticity values for any observed\ncolor, values in each set lie close to a one dimensional locus in chromaticity space. This suggests\nthat the illuminants in our training set are indeed a good \ufb01t to Planck\u2019s law2.\n\nThe goal of our work is to investigate the extent to which we can resolve the above ambiguity\nin true chromaticity on a per-pixel basis, without having to reason about the pixel\u2019s spatial neigh-\nborhood or semantic context. Our approach is based on computing a likelihood distribution over\nthe possible values of \u02c6x(n), given the observed luminance kv(n)k1. But as mentioned in Sec. 2,\nthere is considerable ambiguity in the scale of observed color intensities. We address this par-\ntially by applying a simple per-image global normalization to the observed luminance to de\ufb01ne\n\n1Quantization is over uniformly sized bins in S2\n2In fact, the chromaticities appear to lie on two curves, that are slightly separated from each other. This\n\n+. See supplementary material for details.\n\nseparation is likely due to differences in the sensor responses of the two cameras in the Gehler-Shi dataset.\n\n3\n\n\fFigure 1: Color Constancy with Per-pixel Chromaticity-luminance distributions of natural scenes.\n(a) Ambiguity in true chromaticity given observed color: each set of points corresponds to the\npossible true chromaticity values (location in S2\n+, see legend) consistent with the pixel\u2019s observed\nchromaticity (color of the points) and different candidate illuminants \u02c6mi. (b) Distributions over\ndifferent values for true chromaticity of a pixel conditioned on its observed luminance, computed\nas empirical histograms over the training set. Values y are normalized per-image by the median\nluminance value over all pixels. (c) Corresponding distributions learned with end-to-end training to\nmaximize accuracy of overall illuminant estimation.\n\ny(n) = kv(n)k1/median{kv(n\u2032)k1}n\u2032 . This very roughly compensates for variations across im-\nages due to exposure settings, illuminant brightness, etc. However, note that since the normalization\nis global, it does not compensate for variations due to shading.\n\nThe central component of our inference method is a function L[\u02c6x, y] that encodes the belief that a\npixel with normalized observed luminance y has true chromaticity \u02c6x. This function is de\ufb01ned over\na discrete domain by quantizing both chromaticity and luminance values: we clip luminance values\ny to four (i.e., four times the median luminance of the image) and quantize them into twenty equal\nsized bins; and for chromaticity \u02c6x, we use a much \ufb01nger quantization with 214 equal-sized bins in\nS 2\n+ (see supplementary material for details). In this section, we adopt a purely empirical approach\n\n4\n\n(b) from Empirical Statistics(c) from End-to-end Learningy \u2208 [0,0.2)y \u2208 [0.2,0.4)y \u2208 [0.8,1.0)y \u2208 [1.4,1.6)y \u2208 [1.8,2.0)y \u2208 [2.8,3.0)y \u2208 [3.4,3.6)y \u2208 [3.8,\u221e)y \u2208 [0,0.2)y \u2208 [0.2,0.4)y \u2208 [0.8,1.0)y \u2208 [1.4,1.6)y \u2208 [1.8,2.0)y \u2208 [2.8,3.0)y \u2208 [3.4,3.6)y \u2208 [3.8,\u221e)(a) Ambiguity with Observed ColorRedBlueGreenGreen + BlueBlue + RedRed + Green-15.1-3.3LegendTrue ChromaticitySet of possible truechromaticitiesfor a speci\ufb01c observedcolor v.\f\u2032 N\u02c6x\n\n\u2032,y) , where N\u02c6x,y is the number of pixels across\nall pixels in a set of images in a training set that have true chromaticity \u02c6x and observed luminance y.\n\nand de\ufb01ne L[\u02c6x, y] as L[\u02c6x, y] = log (N\u02c6x,y/P\u02c6x\n\nWe visualize these empirical versions of L[\u02c6x, y] for a subset of the luminance quantization levels\nin Fig. 1(b). We \ufb01nd that in general, desaturated chromaticities with similar intensity values in all\ncolor channels are most common. This is consistent with \ufb01ndings of statistical analysis of natural\nspectra [17], which shows the \u201cDC\u201d component (\ufb02at across wavelength) to be the one with most\nvariance. We also note that the concentration of the likelihood mass in these chromaticities increas-\ning for higher values of luminance y. This phenomenon is also predicted by traditional intuitions\nin color science: materials are brightest when they re\ufb02ect most of the incident light, which typi-\ncally occurs when they have a \ufb02at re\ufb02ectance function with all values of \u03ba(\u03bb) close to one. Indeed,\nthis is what forms the basis of the white-patch retinex method [3]. Amongst saturated colors, we\n\ufb01nd that hues which combine green with either red or blue occur more frequently than primary col-\nors, with pure green and combinations of red and blue being the least common. This is consistent\nwith \ufb01ndings that re\ufb02ectance functions are usually smooth (PCA on pixel spectra in [17] revealed a\nFourier-like basis). Both saturated green and red-blue combinations would require the re\ufb02ectance to\nhave either a sharp peak or crest, respectively, in the middle of the visible spectrum.\n\nWe now describe a method that exploits the belief function L[\u02c6x, y] for illuminant estimation. Given\nthe observed color v(n) at a pixel n, we can obtain a distribution {L[g(v(n), \u02c6mi), y(n)]}i over\nthe set of possible true chromaticity values {g(v(n), \u02c6mi)}i, which can also be interpreted as a\ndistribution over the corresponding illuminants \u02c6mi. We then simply aggregate these distributions\nacross all pixels n in the image, and de\ufb01ne the global probability of \u02c6mi being the scene illuminant\n\nm as pi = exp(li)/ (Pi\u2032 exp(li\u2032 )), where\nN Xn\n\nli =\n\n\u03b1\n\nL[g(v(n), \u02c6mi), y(n)] + \u03b2bi,\n\n(4)\n\nN is the total number of pixels in the image, and \u03b1 and \u03b2 are scalar parameters. The \ufb01nal illuminant\nchromaticity estimate \u00afm is then computed as\n\n\u00afm = arg min\n\nm\n\n\u2032,km\n\n\u2032k2=1\n\nE [cos\u22121(mT m\u2032)] \u2248 arg max\n\u2032k2=1\n\n\u2032,km\n\nm\n\nE[mT m\u2032] = Pi pimi\nkPi pimik2\n\n.\n\n(5)\n\nNote that (4) also incorporates the prior bi over illuminants. We set the parameters \u03b1 and \u03b2 using\na grid search, to values that minimize mean illuminant estimation error over the training set. The\nprimary computational cost of inference is in computing the values of {li}. We pre-compute values\nof g(\u02c6x, \u02c6m) using (3) over the discrete domain of quantized chromaticity values for \u02c6x and the candi-\ndate illuminant set M for \u02c6m. Therefore, computing each li essentially only requires the addition of\nN numbers from a look-up table. We need to do this for all M = |M| illuminants, where summa-\ntions for different illuminants can be carried out in parallel. Our implementation takes roughly 0.3\nseconds for a 9 mega-pixel image, on a modern Intel 3.3GHz CPU with 6 cores, and is available at\nhttp://www.ttic.edu/chakrabarti/chromcc/.\n\nThis empirical version of our approach bears some similarity to the Bayesian method of [14] that\nis based on priors for illuminants, and for the likelihood of different true re\ufb02ectance values being\npresent in a scene. However, the key difference is our modeling of true chromaticity conditioned\non luminance that explicitly makes estimation agnostic to the absolute scale of intensity values. We\nalso reason with all pixels, rather than the set of unique colors in the image.\n\nExperimental Results. Table 1 compares the performance of illuminant estimation with our\nmethod (see rows labeled \u201cEmpirical\u201d) to the current state-of-the-art, using different quantiles of\nangular error across the Gehler-Shi database [14, 15]. Results for other methods are from the survey\nby Li et al. [18]. (See the supplementary material for comparisons to some other recent methods).\n\nWe show results with both three- and ten-fold cross-validation. We \ufb01nd that our errors with three-\nfold cross-validation have lower mean, median, and tri-mean values than those of the best performing\nstate-of-the-art method from [8], which combines illuminant estimates from twelve different \u201cuni-\ntary\u201d color-constancy method (many of which are also listed in Table 1) using support-vector regres-\nsion. The improvement in error is larger with respect to the other combination methods [8, 9, 10, 11],\nas well as those based the statistics of image derivatives [4, 5, 6]. Moreover, since our method has\nmore parameters than most previous algorithms (L[\u02c6x, y] has 214 \u00d7 20 \u2248 300k entries), it is likely\n\n5\n\n\fTable 1: Quantiles of Angular Error for Different Methods on the Gehler-Shi Database [14, 15]\n\nMethod Mean Median Tri-mean\n\nBayesian [14]\n\nGamut Mapping [20]\n\nDeriv. Gamut Mapping [4]\n\nGray World [2]\nGray Edge(1,1,6) [5]\nSV-Regression [21]\n\nSpatio-Spectral [6]\n\nScene Geom. Comb. [9]\n\nNearest-30% Comb. [10]\n\nClassi\ufb01er-based Comb. [11]\n\nNeural Comb. (ELM) [8]\n\nSVR-based Comb. [8]\n\nProposed\n\n(3-Fold)\n\nEmpirical\nEnd-to-end Trained\n\n(10-Fold)\n\nEmpirical\nEnd-to-end Trained\n\n6.74\u25e6\n6.00\u25e6\n5.96\u25e6\n4.77\u25e6\n4.19\u25e6\n4.14\u25e6\n3.99\u25e6\n4.56\u25e6\n4.26\u25e6\n3.83\u25e6\n3.43\u25e6\n2.98\u25e6\n\n2.89\u25e6\n2.56\u25e6\n2.55\u25e6\n2.20\u25e6\n\n5.14\u25e6\n3.98\u25e6\n3.83\u25e6\n3.63\u25e6\n3.28\u25e6\n3.23\u25e6\n3.24\u25e6\n3.15\u25e6\n2.95\u25e6\n2.75\u25e6\n2.37\u25e6\n1.97\u25e6\n\n1.89\u25e6\n1.67\u25e6\n1.58\u25e6\n1.37\u25e6\n\n5.54\u25e6\n4.52\u25e6\n4.32\u25e6\n3.92\u25e6\n3.54\u25e6\n3.35\u25e6\n3.45\u25e6\n3.46\u25e6\n3.19\u25e6\n2.93\u25e6\n2.62\u25e6\n2.35\u25e6\n\n2.15\u25e6\n1.89\u25e6\n1.83\u25e6\n1.53\u25e6\n\n25%-ile\n2.42\u25e6\n1.71\u25e6\n1.68\u25e6\n1.81\u25e6\n1.87\u25e6\n1.68\u25e6\n2.38\u25e6\n1.41\u25e6\n1.49\u25e6\n1.34\u25e6\n1.21\u25e6\n1.13\u25e6\n\n75%-ile\n9.47\u25e6\n8.42\u25e6\n7.95\u25e6\n6.63\u25e6\n5.72\u25e6\n5.27\u25e6\n4.97\u25e6\n6.12\u25e6\n5.39\u25e6\n4.89\u25e6\n4.53\u25e6\n4.33\u25e6\n\n90%-ile\n14.71\u25e6\n14.74\u25e6\n14.72\u25e6\n10.59\u25e6\n8.60\u25e6\n8.87\u25e6\n7.50\u25e6\n10.39\u25e6\n9.67\u25e6\n8.19\u25e6\n6.97\u25e6\n6.37\u25e6\n\n1.15\u25e6\n0.91\u25e6\n0.85\u25e6\n0.69\u25e6\n\n3.68\u25e6\n3.30\u25e6\n3.30\u25e6\n2.68\u25e6\n\n6.24\u25e6\n5.56\u25e6\n5.74\u25e6\n4.89\u25e6\n\nto bene\ufb01t from more training data. We \ufb01nd this to indeed be the case, and observe a considerable\ndecrease in error quantiles when we switch to ten-fold cross-validation.\n\nFigure. 2 shows estimation results with our method for a few sample images. For each image, we\nshow the input image (indicating the ground truth color chart being masked out) and the output image\nwith colors corrected by the global illuminant estimate. To visualize the quality of contributions from\nindividual pixels, we also show a map of angular errors for illuminant estimates from individual\npixels. These estimates are based on values of li computed by restricting the summation in (4) to\nindividual pixels. We \ufb01nd that even these pixel-wise estimates are fairly accurate for a lot of pixels,\neven when it\u2019s true color is saturated (see cart in \ufb01rst row). Also, to evaluate the weight of these\nper-pixel distributions to the global li, we show a map of their variance on a per-pixel basis. As\nexpected from Fig. 1(b), we note higher variances in relatively brighter pixels. The image in the last\nrow represents one of the poorest estimates across the entire dataset (higher than 90%\u2212ile). Note\nthat much of the image is in shadow, and contain only a few distinct (and likely atypical) materials.\n\n4 Learning L[\u02c6x, y] End-to-end\n\nWhile the empirical approach in the previous section would be optimal if pixel chromaticities in a\ntypical image were infact i.i.d., that is clearly not the case. Therefore, in this section we propose an\nalternate approach method to setting the beliefs in L[\u02c6x, y], that optimizes for the accuracy of the \ufb01nal\nglobal illuminant estimate. However, unlike previous color constancy methods that explicitly model\nstatistical co-dependencies between pixels\u2014for example, by modeling spatial derivatives [4, 5, 6],\nor learning functions on whole-image histograms [21]\u2014we retain the overall parametric \u201cform\u201d by\nwhich we compute the illuminant in (4). Therefore, even though L[\u02c6x, y] itself is learned through\nknowledge of co-occurence of chromaticities in natural images, estimation of the illuminant during\ninference is still achieved through a simple aggregation of per-pixel distributions.\n\nSpeci\ufb01cally, we set the entries of L[\u02c6x, y] to minimize a cost function C over a set of training images:\n\nC(L) =\n\nC t(L),\n\nT\n\nXt=1\n\nC t =Xi\n\n6\n\ncos\u22121( \u02c6mT\n\ni \u02c6mt) pt\ni,\n\n(6)\n\n\fGlobal Estimate\n\nPer-Pixel Estimate Error\n\nBelief Variance\n\nInput+Mask\n\nGround Truth\n\nInput+Mask\n\nGround Truth\n\nInput+Mask\n\nl\na\nc\ni\nr\ni\np\nm\nE\n\nd\nn\ne\n-\no\nt\n-\nd\nn\nE\n\nl\na\nc\ni\nr\ni\np\nm\nE\n\nd\nn\ne\n-\no\nt\n-\nd\nn\nE\n\nl\na\nc\ni\nr\ni\n\np\nm\nE\n\nd\nn\ne\n-\no\nt\n-\nd\nn\nE\n\nError = 0.56\u25e6\n\nError = 0.24\u25e6\n\nError = 4.32\u25e6\n\nError = 3.15\u25e6\n\nError = 16.22\u25e6\n\nGround Truth\n\nError = 10.31\u25e6\n\nFigure 2: Estimation Results on Sample Images. Along with output images corrected with the global\nilluminant estimate from our methods, we also visualize illuminant information extracted at a local\nlevel. We show a map of the angular error of pixel-wise illuminant estimates (i.e., computed with li\nbased on distributions from only a single pixel). We also show a map of the variance Var({li}i) of\nthese beliefs, to gauge the weight of their contributions to the global illuminant estimate.\n\nwhere \u02c6mt is the true illuminant chromaticity of the tth training image, and pt\ni is computed from the\nobserved colors vt(n) using (4). We augment the training data available to us by \u201cre-lighting\u201d each\nimage with different illuminants from the training set. We use the original image set and six re-lit\ncopies for training, and use a seventh copy for validation.\n\nWe use stochastic gradient descent to minimize (6). We initialize L to empirical values as described\nin the previous section (for convenience, we multiply the empirical values by \u03b1, and then set \u03b1 = 1\nfor computing li), and then consider individual images from the training set at each iteration. We\nmake multiple passes through the training set, and at each iteration, we randomly sub-sample the\npixels from each training image. Speci\ufb01cally, we only retain 1/128 of the total pixels in the image\nby randomly sub-sampling 16 \u00d7 16 patches at a time. This approach, which can be interpreted as\nbeing similar to \u201cdropout\u201d [12], prevents over-\ufb01tting and improves generalization.\n\n7\n\n\fDerivatives of the cost function C t with respect to the current values of beliefs L[\u02c6x, y] are given by\n\n\u2202C t\n\n\u2202L[\u02c6x, y]\n\n=\n\n1\n\nwhere\n\n\u2202C t\n\u2202lt\ni\n\n= pt\n\nN Xi Xn\ni(cid:0)cos\u22121( \u02c6mT\n\n\u03b4(cid:0)g(vt(n), \u02c6mi) = \u02c6x(cid:1) \u03b4(cid:0)yt(n) = y(cid:1)! \u00d7\ni \u02c6mt) \u2212 C t(cid:1) .\n\n\u2202C t\n\u2202lt\ni\n\n,\n\nWe use momentum to update the values of L[\u02c6x, y] at each iteration based on these derivative as\n\nL[\u02c6x, y] = L[\u02c6x, y] \u2212 L\u2207[\u02c6x, y], L\u2207[\u02c6x, y] = r\n\n\u2202C t\n\n\u2202L[\u02c6x, y]\n\n+ \u00b5L\u2207\n\n\u2217 [\u02c6x, y],\n\n(7)\n\n(8)\n\n(9)\n\nwhere L\u2207\n\u2217 [\u02c6x, y] is the previous update value, r is the learning rate, and \u00b5 is the momentum factor. In\nour experiments, we set \u00b5 = 0.9, run stochastic gradient descent for 20 epochs with r = 100, and\nanother 10 epochs with r = 10. We retain the values of L from each epoch, and our \ufb01nal output is\nthe version that yields the lowest mean illuminant estimation error on the validation set.\n\nWe show the belief values learned in this manner in Fig. 1(c). Notice that although they retain the\noverall biases towards desaturated colors and combined green-red and green-blue hues, they are less\n\u201csmooth\u201d than their empirical counterparts in Fig. 1(b)\u2014in many instances, there are sharp changes\nin the values L[\u02c6x, y] for small changes in chromaticity. While harder to interpret, we hypothesize\nthat these variations result from shifting beliefs of speci\ufb01c (\u02c6x, y) pairs to their neighbors, when they\ncorrespond to incorrect choices within the ambiguous set of speci\ufb01c observed colors.\n\nExperimental Results. We also report errors when using these end-to-end trained versions of the\nbelief function L in Table 1, and \ufb01nd that they lead to an appreciable reduction in error in comparison\nto their empirical counterparts. Indeed, the errors with end-to-end training using three-fold cross-\nvalidation begin to approach those of the empirical version with ten-fold cross-validation, which\nhas access to much more training data. Also note that the most signi\ufb01cant improvements (for both\nthree- and ten-fold cross-validation) are in \u201coutlier\u201d performance, i.e., in the 75 and 90%-ile error\nvalues. Color constancy methods perform worst on images that are dominated by a small number of\nmaterials with ambiguous chromaticity, and our results indicate that end-to-end training increases\nthe reliability of our estimation method in these cases.\n\nWe also include results for the end-to-end case for the example images in Figure. 2. For all three\nimages, there is an improvement in the global estimation error. More interestingly, we see that the\nper-pixel error and variance maps now have more high-frequency variation, since L now reacts more\nsharply to slight chromaticity changes from pixel to pixel. Moreover, we see that a larger fraction of\npixels generate fairly accurate estimates by themselves (blue shirt in row 2). There is also a higher\ndisparity in belief variance, including within regions that visually look homogeneous in the input,\nindicating that the global estimate is now more heavily in\ufb02uenced by a smaller fraction of pixels.\n\n5 Conclusion and Future Work\n\nIn this paper, we introduced a new color constancy method that is based on a conditional likelihood\nfunction for the true chromaticity of a pixel, given its luminance. We proposed two approaches to\nlearning this function. The \ufb01rst was based purely on empirical pixel statistics, while the second\nwas based on maximizing accuracy of the \ufb01nal illuminant estimate. Both versions were found to\noutperform state-of-the-art color constancy methods, including those that employed more complex\nfeatures and semantic reasoning. While we assumed a single global illuminant in this paper, the\nunderlying per-pixel reasoning can likely be extended to the multiple-illuminant case, especially\nsince, as we saw in Fig. 2, our method was often able to extract reasonable illuminant estimates\nfrom individual pixels. Another useful direction for future research is to investigate the bene\ufb01ts of\nusing likelihood functions that are conditioned on lightness\u2014estimated using an intrinsic image de-\ncomposition method\u2014instead of normalized luminance. This would factor out the spatially-varying\nscalar ambiguity caused by shading, which could lead to more informative distributions.\n\nAcknowledgments\n\nWe thank the authors of [18] for providing estimation results of other methods for comparison. The\nauthor was supported by a gift from Adobe.\n\n8\n\n\fReferences\n\n[1] D.H. Brainard and A Radonjic. Color constancy. In The new visual neurosciences, 2014.\n\n[2] G. Buchsbaum. A spatial processor model for object colour perception. J. Franklin Inst, 1980.\n\n[3] E.H. Land. The retinex theory of color vision. Scienti\ufb01c American, 1971.\n\n[4] A. Gijsenij, T. Gevers, and J. van de Weijer. Generalized gamut mapping using image derivative structures\n\nfor color constancy. IJCV, 2010.\n\n[5] J. van de Weijer, T. Gevers, and A. Gijsenij. Edge-Based Color Constancy. IEEE Trans. Image Proc.,\n\n2007.\n\n[6] A. Chakrabarti, K. Hirakawa, and T. Zickler. Color Constancy with Spatio-spectral Statistics. PAMI,\n\n2012.\n\n[7] H. R. V. Joze and M. S. Drew. Exemplar-based color constancy and multiple illumination. PAMI, 2014.\n\n[8] B. Li, W. Xiong, and D. Xu. A supervised combination strategy for illumination chromaticity estimation.\n\nACM Trans. Appl. Percept., 2010.\n\n[9] R. Lu, A. Gijsenij, T. Gevers, V. Nedovic, and D. Xu. Color constancy using 3D scene geometry. In\n\nICCV, 2009.\n\n[10] S. Bianco, F. Gasparini, and R. Schettini. Consensus-based framework for illuminant chromaticity esti-\n\nmation. J. Electron. Imag., 2008.\n\n[11] S. Bianco, G. Ciocca, C. Cusano, and R. Schettini. Automatic color constancy algorithm selection and\n\ncombination. Pattern recognition, 2010.\n\n[12] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to\n\nprevent neural networks from over\ufb01tting. JMLR, 2014.\n\n[13] H. Chong, S. Gortler, and T. Zickler. The von Kries hypothesis and a basis for color constancy. In Proc.\n\nICCV, 2007.\n\n[14] P.V. Gehler, C. Rother, A. Blake, T. Minka, and T. Sharp. Bayesian Color Constancy Revisited. In CVPR,\n\n2008.\n\n[15] L. Shi and B. Funt. Re-processed version of the gehler color constancy dataset of 568 images. Accessed\n\nfrom http://www.cs.sfu.ca/~colour/data/.\n\n[16] D.B. Judd, D.L. MacAdam, G. Wyszecki, H.W. Budde, H.R. Condit, S.T. Henderson, and J.L. Simonds.\n\nSpectral distribution of typical daylight as a function of correlated color temperature. JOSA, 1964.\n\n[17] A. Chakrabarti and T. Zickler. Statistics of Real-World Hyperspectral Images. In Proc. CVPR, 2011.\n\n[18] B. Li, W. Xiong, W. Hu, and B. Funt. Evaluating combinational illumination estimation methods on\nreal-world images. IEEE Trans. Imag. Proc., 2014. Data at http://www.escience.cn/people/\nBingLi/Data_TIP14.html.\n\n[19] S. Bianco, C. Cusano, and R. Schettini. Color constancy using cnns. arXiv:1504.04548[cs.CV], 2015.\n\n[20] D. Forsyth. A novel algorithm for color constancy. IJCV, 1990.\n\n[21] W. Xiong and B. Funt.\n\nEstimating illumination chromaticity via support vector\n\nregression.\n\nJ. Imag. Sci. Technol., 2006.\n\n[22] A. Gijsenij, T. Gevers, and J. van de Weijer. Computational color constancy: Survey and experiments.\n\nIEEE Trans. Image Proc., 2011. Data at http://www.colorconstancy.com/.\n\n9\n\n\f", "award": [], "sourceid": 82, "authors": [{"given_name": "Ayan", "family_name": "Chakrabarti", "institution": "TTI Chicago"}]}