{"title": "Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance", "book": "Advances in Neural Information Processing Systems", "page_first": 765, "page_last": 773, "abstract": "We address the challenging task of decoupling material properties from lighting properties given a single image. In the last two decades virtually all works have concentrated on exploiting edge information to address this problem. We take a different route by introducing a new prior on reflectance, that models reflectance values as being drawn from a sparse set of basis colors. This results in a Random Field model with global, latent variables (basis colors) and pixel-accurate output reflectance values. We show that without edge information high-quality results can be achieved, that are on par with methods exploiting this source of information. Finally, we present competitive results by integrating an additional edge model. We believe that our approach is a solid starting point for future development in this domain.", "full_text": "Recovering Intrinsic Images with a Global Sparsity\n\nPrior on Re\ufb02ectance\n\nPeter Vincent Gehler\n\nMax Planck Institut for Informatics\n\npgehler@mpii.de\n\nCarsten Rother\n\nMicrosoft Research Cambridge\n\ncarrot@microsoft.com\n\nMartin Kiefel, Lumin Zhang, Bernhard Sch\u00a8olkopf\n\nMax Planck Institute for Intelligent Systems\n{mkiefel,lumin,bs}@tuebingen.mpg.de\n\nAbstract\n\nWe address the challenging task of decoupling material properties from lighting\nproperties given a single image. In the last two decades virtually all works have\nconcentrated on exploiting edge information to address this problem. We take a\ndifferent route by introducing a new prior on re\ufb02ectance, that models re\ufb02ectance\nvalues as being drawn from a sparse set of basis colors. This results in a Random\nField model with global, latent variables (basis colors) and pixel-accurate output\nre\ufb02ectance values. We show that without edge information high-quality results\ncan be achieved, that are on par with methods exploiting this source of informa-\ntion. Finally, we are able to improve on state-of-the-art results by integrating edge\ninformation into our model. We believe that our new approach is an excellent\nstarting point for future developments in this \ufb01eld.\n\n1\n\nIntroduction\n\nThe task of recovering intrinsic images is to separate a given input image into its material-dependent\nproperties, known as re\ufb02ectance or albedo, and its light-dependent properties, such as shading, shad-\nows, specular highlights, and inter-re\ufb02ectance. A successful separation of these properties would be\nbene\ufb01cial to a number of computer vision tasks. For example, an image which solely depends on\nmaterial-dependent properties is helpful for image segmentation and object recognition [11], while\na clean image of shading is a valuable input to shape-from-shading algorithms.\nAs in most previous work in this \ufb01eld, we cast the intrinsic image recovery problem into the follow-\ning simpli\ufb01ed form, where each image pixel is the product of two components:\n\nI = sR .\n\n(1)\nHere I \u2208 R3 is the pixel\u2019s color, in RGB space, R \u2208 R3 is its re\ufb02ectance and s \u2208 R its \u201cshading\u201d.\nNote, we use \u201cshading\u201d as a proxy for all light-dependent properties, e.g. shadows. The fact that\nshading is only a 1D entity imposes some limitations. For example, shading effects stemming from\nmultiple light sources can only be modeled if all light sources have the same color.1 The goal of this\nwork is to estimate s and R given I. This problem is severely under-constraint, with 4 unknowns\nand 3 constraints for each pixel. Hence, a trivial solution to (1) is, for instance, I = R, s = 1 for all\npixels. The main focus of this paper is on exploring sensible priors for both shading and re\ufb02ectance.\nDespite the importance of this problem surprisingly little research has been conducted in recent\nyears. Most of the inventions were done in the 70s and 80s. The recent comparative study [7] has\nshown that the simple Retinex method [9] from the 70s is still the top performing approach. Given\n\n1This problem can be overcome by utilizing a 3D vector for s, as done in [4], which we however do not\n\nconsider in this work.\n\n\f(a) Image I \u201cpaper1\u201d\n\n(b) I (in RGB)\n\n(c) Re\ufb02ectance R\n\n(d) R (in RGB)\n\n(e) Shading s\n\nFigure 1: An image (a), its color in RGB space (b), the re\ufb02ectance image (c), its distribution in\nRGB space (d), and the shading image (e). Omer and Werman [12] have shown that an image of\na natural scene often contains only a few different \u201cbasis colorlines\u201d. Figure (b) shows a dominant\ngray-scale color-line and other color lines corresponding to the scribbles on the paper (a). These\ncolorlines are generated by taking a small set of \u201cbasis colors\u201d which are then linearly \u201csmeared\u201d\nout in RGB space. The basis colors are clearly visible in (d), where the cluster for white (top,\nright) is the dominant one. This \u201csmearing effect\u201d comes from properties of the scene (e.g. shading\nor shadows), and/or properties of the camera, e.g. motion blur. (Note, the few pixels in-between\nclusters are due to anti-aliasing effects). In this work we approximate the basis colors by a simple\nmixture of isotropic Gaussians.\n\nthe progress in the last two decades on probabilistic models, inference and learning techniques, as\nwell as the improved computational power, we believe that now is a good time to revisit this problem.\nThis work, together with the recent papers [14, 4, 7, 15], are a \ufb01rst step in this direction.\nThe main motivation of our work is to develop a simple, yet powerful probabilistic model for shading\nand re\ufb02ectance estimation.\nIn total we use three different types of factors. The \ufb01rst one is the\nmost commonly used factor and is key ingredient of all Retinex-based methods. The idea is to\nextract those image edges which are (potentially) true re\ufb02ectance edges and then to recover a new\nre\ufb02ectance image that contains only these edges, using a set of Poisson equations. This term on\nits own is enough to recover a non-trivial decomposition, i.e. s (cid:54)= 1. The next factor is a simple\nsmoothness prior on shading between neighboring image pixels, and has been used by some previous\nwork e.g. [14]. Note, there are a few works, which we discuss in more detail later, that extend these\npairwise terms to become patch-based. The third prior term is the main contribution of our work and\nis conceptually very different from the local (pairwise or patch-based) constraints of previous works.\nWe propose a new global (image-wide) sparsity prior on re\ufb02ectance based on the \ufb01ndings of [12]\nand discussed in Fig 1. In the absence of other factors this already produces non-trivial results. This\nprior takes the form of a Mixture of Gaussians, and encodes the assumption that the re\ufb02ectance value\nfor each pixel is drawn from some mixing components, which in this context we refer to as \u201cbasis\ncolors\u201d. The complete model forms a latent variable Random Field model for which we perform\nMAP estimation.\nBy combining the different terms we are able to outperform state-of-the art. If we use image optimal\nparameter settings we perform on par with methods that use multiple images as input. To empirically\nvalidate this we use the database introduced in the comparative study [7].\n\n2 Related Work\n\nThere is a vast amount of literature on the problem of recovering intrinsic images. We refer the\nreader to detailed surveys in [8, 17, 7], and limit our attention to some few related works.\nBarrow and Tenenbaum [2] were the \ufb01rst to de\ufb01ne the term \u201cintrinsic image\u201d. Around the same\ntime the \ufb01rst solution to this problem was developed by Land and McCann [9] known as the Retinex\nalgorithm. After that the Retinex algorithm was extended to two dimensions by Blake [3] and\nHorn [8], and later applied to color images [6]. The basic Retinex algorithm is a 2-step procedure:\n1) detect all image gradients which are caused by changes in re\ufb02ectance; 2) recover a re\ufb02ectance\nimage which preserves the detected re\ufb02ectance gradients. The basic assumption of this approach\nis that small image gradients are more likely caused by a shading effect and strong gradients by a\nchange in re\ufb02ectance. For color images this rule can be extended by treating changes in the 1D\nbrightness domain differently to changes in the 2D chromaticity space.2 This method, which we\ndenote as \u201cColor Retinex\u201d was the top performing method in the recent comparison paper [7]. Note,\n\n2Note, a gradient in chromaticity can only be caused by differently colored light sources, or inter-re\ufb02ectance.\n\n2\n\n\fthe only approach which could beat Retinex utilizes multiple images [19]. Surprisingly, the study\n[7] also shows that more sophisticated methods for training the re\ufb02ectance edge detector, using\ne.g. images patches, did not perform better than the basic Retinex method. In particular the study\ntested two methods of Tappen et al. [17, 16]. A plausible explanation is offered, namely that these\nmethods may have over-\ufb01tted the small amount of training data. The method [17] has an additional\nintermediate step where a Markov Random Field (MRF) is used to \u201cpropagate\u201d re\ufb02ectance gradients\nalong contour lines.\nThe paper [15] implements the same intuition as done here, namely that there is a sparse set of re-\n\ufb02ectances present in the scene. However both approaches bear the following differences. In [15] a\nsparsity enforcing term is included, that is penalizing re\ufb02ectance differences from some prototype\nreferences. This term encourages all re\ufb02ectances to take on the same value, while the model we\npropose in this paper allows for a mixture of different material re\ufb02ectances and thus keeps their\ndiversity. Also, in contrast to [15], where a gradient aware wavelet transform is used as a new\nrepresentation, here we work directly in the RGB domain. By doing so we directly extend previ-\nous intrinsic image models which makes evident the gains that can be attributed to a global sparse\nre\ufb02ectance term alone.\nRecently, Shen et al. [14] introduced an interesting extension of the Retinex method, which bears\nsome similarity with our approach. The key idea in their work is to perform a pre-processing step\nwhere the (normalized) re\ufb02ectance image is partitioned into a few clusters. Each cluster is treated\nas a non-local \u201csuper-pixel\u201d. Then a variant of the Retinex method is run on this super-pixel image.\nThe conceptual similarity to our approach is the idea of performing an image-wide clustering step.\nHowever, the differences are that they do not formulate this idea as a joint probabilistic model over\nlatent re\ufb02ectance \u201cbasis colors\u201d and shading variables. Furthermore, every pixel in a super-pixel\nmust have the same intensity, which is not the case in our work. Also, they need a Retinex type of\nedge term to avoid the trivial solution of s = 1.\nFinally, let us brie\ufb02y mention techniques which use patch-based constraints, instead of pair-wise\nterms. The seminal work of Freeman et al. on learning low-level vision [5] formulates a probabilistic\nmodel for intrinsic images.\nIn essence, they build a patch-based prior jointly over shading and\nre\ufb02ectance.\nIn a new test image the best explanation for re\ufb02ectance and shading is determined.\nThe key idea is that patches do overlap, and hence form an MRF, where long-range propagation\nis possible. Since no large-scale ground database was available at that time, they only train and\ntest on computer generated images of blob-like textures. Another patch-based method was recently\nsuggested in [4]. They introduce a new energy term which is satis\ufb01ed when all re\ufb02ectance values\nin a small, e.g. 3 \u00d7 3, patch lie on a plane in RGB space. This idea is derived from the Laplacian\nmatrix used for image matting [10]. On its own this term gives in practice often the trivial solution\ns = 1. For that reason additional user scribbles are provided to achieve high-quality results.3\n\n3 A Probabilistic Model for Intrinsic Images\n\nThe model outlined here falls into the class of Conditional Random Fields, specifying a conditional\nprobability distribution over re\ufb02ectance R and shading S components for a given image I\n\np(s, R | I) \u221d exp (\u2212E(s, R | I)) .\n\n(2)\nBefore we describe the energy function E in detail, let us specify the notation. We will denote with\nsubscripts i the values at location i in the image. Thus Ii is an image pixel (vector of dimension 3),\nRi a re\ufb02ectance vector (a 3-vector), si the shading (a scalar). The total number of pixels in an image\nis N. With boldface we denote vectors of components, e.g. s = (s1, . . . , sN ).\nThere are two ways to use the relationship (1) to formulate a model for shading and re\ufb02ectance,\ncorresponding to two different image likelihoods p(I | s, R). One possible way is to relax the\nrelation (1) and for example assume a Gaussian likelihood p(I | s, R) \u221d exp(\u2212(cid:107)I \u2212 sR(cid:107)2) to\naccount for some noise in the image formation process. This yields an optimization problem with\n4N unknowns. The second possibility is to assume a delta-prior around sR which results in the\ni has to hold of all color channels c = {R, G, B},\nfollowing complexity reduction. Since I c\nthe unknown variables are speci\ufb01ed up to scalar multipliers, in other words the direction of Ri is\nalready known. We rewrite Ri = ri (cid:126)Ri, with (cid:126)Ri = Ii/(cid:107)Ii(cid:107), leaving r = (r1, . . . , rN ) to be the\n3We performed initial tests with this term. However, we found that it did not help to improve performance.\n\ni = siRc\n\n3\n\n\fonly unknown variable. The shading components can be computed using si = (cid:107)Ii(cid:107)/ri. Thus the\noptimization problem is reduced to a search of N variables.\nThe latter reduction is commonly exploited by intrinsic image algorithms in order to simplify the\nmodel [7, 14, 4] and in the remainder we will also make use of it. This allows us to write all model\nparts in terms of r.\nNote that there is a global scalar k by which the result s, R can be modi\ufb01ed without effecting eq. (1),\ni.e. I = (sk)(1/kR). For visualization purpose k is chosen such that the results are visually closest\nto the known ground truth.\n\n3.1 Model\nThe energy function we describe here consists of three different terms that are linearly combined.\nWe will describe the three components and their in\ufb02uence in greater detail below, \ufb01rst we write the\noptimization problem that corresponds to a MAP solution in its most general form\n\nmin\n\nri,\u03b1i;i=1,...,n\n\nwsEs(r) + wrEret(r) + wclEcl(r, \u03b1).\n\n(3)\n\nNote, the global scale of the energy is not important, hence we can always \ufb01x one non-zero weight\nws, wr, wcl to 1.\nShading Prior (Es) We expect the shading of an image to vary smoothly over the image and we\nencode this in the following pairwise factors\n\nEs(r) =(cid:88)\n\n(cid:0)r\u22121\n\ni\u223cj\n\nj (cid:107)Ij(cid:107)(cid:1)2\n\ni (cid:107)Ii(cid:107) \u2212 r\u22121\n\n,\n\n(4)\n\nwhere we use a 4-connected pixel graph to encode the neighborhood relation which we denote with\ni \u223c j. Because of the dependency on the inverse of r, this term is not jointly convex in r. Any\nmodel that includes this smoothness prior thus has the (potential) problem of multiple local minima.\nEmpirically we have seen that, however, this function seems to be very well behaved, a large range of\ndifferent starting points for r resulted in the same minimum. Nevertheless, we use multiple restarts\nwith different starting points, see optimization selection 3.2.\n\nGradient Consistency (Eret) As discussed in the introduction, the main idea of the Retinex algo-\nrithm is to disambiguate between edges that are due to shading variations from those that are caused\nby material re\ufb02ectance changes. This idea is then implemented as follows. Assume that we already\nknow, or have classi\ufb01ed, that an edge at location i, j in the input image is caused by a change in re-\n\ufb02ectance. Then we know the magnitude of the gradient that has to appear in the re\ufb02ectance map by\nnoting that log(Ii)\u2212log(Ij) = log(ri (cid:126)Ri)\u2212log(rj (cid:126)Rj). Using the fact log((cid:107)Ii(cid:107)) = log(I c\ni )\u2212log( (cid:126)Rc\ni )\n(for all channels c) and assuming a squared deviation around the log gradient magnitude, this trans-\nlates into the following Gaussian MRF term on the re\ufb02ectances\n\n(log(ri) \u2212 log(rj) \u2212 gij(I)(log((cid:107)Ii(cid:107)) \u2212 log((cid:107)Ij(cid:107))))2 .\n\n(5)\n\nEret(r) =(cid:88)\n\ni\u223cj\n\nIt remains to specify the classi\ufb01cation function g(I) for the image edges. In this work we adopt the\nColor Retinex version that has been proposed in [7]. For each pixel i and a neighbor j we compute\nthe gradient of the intensity image and the gradient of the chromaticity change. If both gradients\nexceed a certain threshold (\u03b8g and \u03b8c resp.), the edge at i, j is classi\ufb01ed as being a \u201cre\ufb02ectance edge\u201d\nand in this case gij(I) = 1. The two parameters which are the thresholds \u03b8g, \u03b8c for the intensity\nand the chromaticity change are then estimated using leave-one-out-cross validation. It is worth\nnoting that this term is qualitatively different from the smoothness prior on shading (4) even for\npixels where gij(I) = 0. Here, the log-difference is penalized whereas the shading smoothness\ndoes also depend on the intensity values (cid:107)Ii(cid:107),(cid:107)Ij(cid:107). By setting wcl, ws = 0 in Eq. (2) we recover\nColor Retinex [7].\nGlobal Sparse Re\ufb02ectance Prior (Ecl ) Motivated by the \ufb01ndings of [12] we include a term\nthat acts as a global potential on the re\ufb02ectances and favors the decomposition into some few\nre\ufb02ectance clusters. We assume C different re\ufb02ectance clusters, each of which is denoted by\n\u02dcRc, c \u2208 {1, . . . , C}. Every re\ufb02ectance component ri belongs to one of the clusters and we de-\nnote its cluster membership with the variable \u03b1i \u2208 {1, . . . , C}. This is summarized in the following\nenergy term\n\n4\n\n\fFigure 2: A crop from the image \u201cpanther\u201d. Left: input image I and true decomposition (R, s). Note,\nthe colors in re\ufb02ectance image (True R) have been modi\ufb01ed on purpose such that there are exactly 4\ndifferent colors. The second column shows a clustering (here from the solution with ws = 0), where\neach cluster has an arbitrary color. The remaining columns show results with various settings for C\nand ws (left re\ufb02ectance image, right shading image). Top row is the result for C = 4 and bottom\nrow for C = 50 clusters, columns are results for ws = 0, 10\u22125, and 0.1. Below the images is the\ncorresponding LMSE score (described in Section 4.1). (Note, results are visually slightly different\nsince the unknown overall global scaling factor k is set differently, that is I = (sk)(1/kR).\n\nn(cid:88)\n\nEcl(r, \u03b1) =\n\n(cid:107)ri (cid:126)Ri \u2212 \u02dcR\u03b1i(cid:107)2.\n\n(6)\n\ni=1\n\nHere, both continuous r and discrete \u03b1 variables are mixed. This represents a global potential, since\nthe cluster means depend on the assignment of all pixels in the image. For \ufb01xed \u03b1, this term is\nconvex in r and for \ufb01xed r the optimum of \u03b1 is a simple assignment problem. The cluster means \u02dcRc\nare optimally determined given r and \u03b1: \u02dcRc =\n\n(cid:80)\n\n1\n\ni:\u03b1i=c ri (cid:126)Ri.\n\n|{i:\u03b1i=c}|\n\nRelationship between Ecl and Es The example in Figure 2 highlights the in\ufb02uence of the terms.\nWe use a simpli\ufb01ed model (2), namely Ecl + wsEs, and vary ws as well as the number of clusters.\nLet us \ufb01rst consider the case where ws = 0 (third column).\nIndependent of the clustering we\nget an imperfect result. This is expected since there is no constraint across clusters. Hence the\nshading within one cluster looks reasonable, but is not aligned across clusters. By adding a little\nbit of smoothing (ws = 10\u22125; 4\u2019th column), this problem is cured for both clusterings. It is very\nimportant to note that too many clusters (here C=50) do not affect the result very much. The reason\nis that enough clustering constraints are present to recover the variation in shading. If we were to\ngive each pixel its own cluster this would no longer be true and we would get the trivial solution of\ns = 1. Finally, results deteriorate when the smoothing term is too strong (last column ws = 0.1),\nsince it prefers a constant shading. Note, that for this simple toy example the smoothness prior was\nnot important, however for real images the best results are achieved by using a non-zero ws.\n\n3.2 Optimization of (3)\nThe MAP problem (3) consists of\nboth discrete and continuous vari-\nables and we solve it using coordinate\ndescent. The entire algorithm is sum-\nmarized in Algorithm 1. 4\nGiven an initial value for \u03b1 we have\nseen empirically that our function\ntends to yield same solutions, irre-\nspective of the starting point r. In or-\nder to be also robust with respect to\nthis initial choice, we choose from a range of initial r values as described next. From these start-\ning points we choose the one with the lowest objective value (energy) and its corresponding result.\n\nAlgorithm 1 Coordinate Descent for solving (3)\n1: Select r0 as described in the text\n2: \u03b10 \u2190 K-Means clustering of {r0\n3: t \u2190 0\ni\n4: repeat\n5:\n6:\n7:\n8:\n9: until E(rt\u22121, \u03b1t\u22121) \u2212 E(rt, \u03b1t) < \u03b8\n\nrt+1 \u2190 optimize (3) with \u03b1t \ufb01xed\ni:\u03b1i=c ri (cid:126)Ri/|{i : \u03b1i = c}|\n\u03b1t+1 \u2190 assign new cluster labels with rt+1 \ufb01xed\nt \u2190 t + 1\n\n\u02dcRc =(cid:80)\n\n(cid:126)Ri, i = 1, . . . , N}\n\n4Code available http://people.tuebingen.mpg.de/mkiefel/projects/intrinsic\n\n5\n\n\fcomment\nColor Retinex\nno edge information\nCol-Ret+ global term\nfull model\n\nEs Ecl Eret\n(cid:88)\n-\n-\n(cid:88) (cid:88)\n-\n(cid:88)\n(cid:88)\n-\n(cid:88)\n(cid:88) (cid:88)\n\n29.5\n30.0\n27.2\n27.4\n\nLOO-CV best single\n\n29.5\n30.6\n24.4\n24.4\n\nimage opt.\n\n25.5\n18.2\n18.1\n16.1\n\nTable 1: Comparing the effect of including different terms. The column \u201cbest-single\u201d is the pa-\nrameter set that works best on all 16 images jointly, \u201cimage opt.\u201d is the result when choosing the\nparameters optimal for each image individually, based on ground truth information.\nWe have seen empirically that this procedure gives stable results. For instance, we virtually always\nachieve a lower energy compared to using the ground truth r as initial start point.\n\nfor a given \ufb01xed \u03b1 this is implemented using a conjugate gradient descent solver [1].\n\nIt is reasonable to assume that the output has a \ufb01xed range, i.e. 0 \u2265 Rc\n\ni , si \u2265 1\nInitialization of r\n(for all c, i).5 In particular, this is true for the data in [7]. From these constraints we can derive\nthat (cid:107)Ii(cid:107) \u2265 ri \u2265 3. Given that, we use the following three starting points for r, by varying \u03b3 \u2208\n{0.3, 0.5, 0.7}: ri = \u03b3(cid:107)Ii(cid:107) + 3(1 \u2212 \u03b3). Additionally we choose the start point r = 1. From these\nfour different initial settings we choose the result which corresponds to the lowest \ufb01nal energy.\nInitialization of \u03b1 Given an initial value for r we can compute the terms in Eq.(6) and use K-\nMeans clustering to optimize it. We use the best solution from \ufb01ve restarts.\nUpdating r\nThis typically converges in some few hundred iterations for the images used in the experiments.\nUpdating \u03b1 for given r this is a simple assignment problem: \u03b1i = argminc=1,...,C(cid:107)ri (cid:126)Ri \u2212 \u02dcRc(cid:107)2.\n4 Experiments\nFor the empirical evaluation we use the intrinsic image database that has been introduced in [7].\nThis dataset consists of 16 different images for all of which the ground truth shading and re\ufb02ectance\ncomponents are available. We refer to [7] for details on how this data was collected. Some of the\nimages can be seen in Figure 3. In all experiments we compare against Color Retinex which was\nfound to be the best performing method among those that take a single image as input. The method\nfrom [19] yields better results but requires multiple input images from different light variations.\n4.1 Error metric\nWe report the performance of the algorithms using the two different error metrics that have been\nsuggested by the creators of the database [7]. The \ufb01rst metric is the average of the localized mean\nsquared error (LMSE) between the predicted and true shading and predicted and true re\ufb02ectance\nimage. 6 Since the LMSE vary considerably we also use the average rank of the algorithm.\n4.2 Experimental set-up and parameter learning\nAll free parameters of the models, e.g. the weights wcl, ws, wr and the gradient thresholds \u03b8c, \u03b8g\nhave been chosen using a leave-one-out estimate (LOO-CV). Due to the high variance of the scores\nfor the images we used the median error to score the parameters. Thus for image i the parameter\nwas chosen that leads to the lowest median error on all images except i. Additionally we record the\nbest single parameter set that works well on all images, and the score that is obtained when using\nthe optimal parameters on each image individually. Although the latter estimate involves knowing\nground truth estimates we are interested in the lower bound of the performance, in an interactive\nscenario a user can provide additional information to achieve this, as in [4].\nWe select the parameters from the following ranges. Whenever used, we \ufb01x wcl = 1 since it\nsuf\ufb01ces to specify the relative difference between the parameters. For models using both the cluster\nand shading smoothness terms, we select from ws \u2208 {0.001, 0.01, 0.1}, for models that use the\ncluster and Color Retinex term wr \u2208 {0.001, 0.01, 0.1, 1, 10}. When all three terms are non-zero,\nwe vary ws as above paired with wr \u2208 \u00d7{0.1ws, ws, 10ws}. The gradient thresholds are varied\nin \u03b8g, \u03b8c \u2208 {0.075, 1} which yields four possible con\ufb01gurations. The re\ufb02ectance cluster count is\nvaried in C \u2208 {10, 50, 150}.\n\n5This assumption is violated if there is no global scalar k such that 0 \u2265 (1/kRc\n6We multiply by 1000 for easier readability\n\ni ), (ksi) \u2265 1.\n\n6\n\n\f-\n-\n\n-\n-\nn/a\n-\n\n56\u2217\n39\u2217\nn/a\nn/a\n72.6\n40.7\n29.5\n27.4\n21.5\n16.4\n\n4.3 Comparison - Model variations\nIn a \ufb01rst set of experiments we investigate the in\ufb02uence of using combinations of the prior terms\ndescribed in Section 3.1. The numerical results are summarized in Table 1.\nThe \ufb01rst observation is that the Color Retinex algorithm (1st row) performs about similar to the\nsystem using a shading smoothness prior together with the global factor Ecl (2nd row). Note that\nthe latter system does not use any gradient information for estimation. This con\ufb01rms our intuition\nthat the term Ecl provides strong coupling information between re\ufb02ectance components, as also\ndiscussed in Figure 2. The lower value for the image optimal setting of 18.2 compared to 25.5 for\nColor Retinex indicates that one would bene\ufb01t from a better parameter estimate, i.e. the \ufb02exibility\nof this algorithm is higher. Equipping Color Retinex with the global re\ufb02ectance term improves all\nrecorded results (3rd vs 2nd row). Again it seems that the LOO-CV parameter estimation is more\nstable in this case. Combining all three parts (4th row) does not improve the results over Color\nRetinex with the re\ufb02ectance prior. With knowledge about the optimal image parameter it yields a\nlower LMSE score (16.1 vs 18.1).\n4.4 Comparison to Literature\n\nLOO-CV rank\n\nbest single\n\nim. opt.\n\n-\n-\nn/a\nn/a\n5.1\n4.9\n3.7\n3.0\n2.7\n1.7\n\nTAP05 [17]\nTAP06 [16]\nSHE [14]+\nSHE [15]\u00d7\nBAS [7]\nGray-Ret [7]\nCol-Ret\nfull model\nWeiss [19]\nWeiss+Ret [7]\n\n36.6\n28.9\n25.5\n16.1\n21.5\n15.0\n\n56.2\n(20.4)\n60.3\n40.7\n29.5\n24.4\n21.5\n16.4\n\nIn Table 2 we compare the numer-\nical results of our method to other\nintrinsic image algorithms. We\nagain include the single best pa-\nrameter and image dependent opti-\nmal parameter set. Although those\nare positively biased and obviously\ndecrease with model complexity\nwe believe that they are informa-\ntive, given the parameter estimation\nTable 2: Method comparison with other intrinsic image algo-\nproblems due to the diverse and\nrithms also compared in [7]. Refer to Tab. 1 for a description\nsmall database. The full model us-\nof the quantities. Note that the last two methods from [19]\ning all terms Ecl, Es and Ecret im-\nuse multiple input images. For entries \u2019-\u2019 we had no individ-\nproves over all the compared meth-\nual results (and no code), the two numbers marked \u2217 are esti-\nods that use only a single image as\nmated from Fig4.a [7]. SHE+ is our implementation. SHE\u00d7\ninput, but SHE\u00d7 (see below). The\nNote that in [15] results were only given for 13 of 16 images\ndifference in rank between (Col-\nfrom [7]. The additional data was kindly provided by authors.\nRet) and (full model) indicates that\nthe latter model is almost always better (direct comparison: 13 out of 16 images) than Color Retinex\nalone. The full model is even better on 6/16 images than the Weiss algorithm [19] that uses multiple\nimages. Regarding the results of SHE\u00d7, we could not resolve with certainty whether the reported\nresults should be compared as \u201cbest single\u201d or \u201cim.opt.\u201d (most parameters in [15] are common to\nall images, the strategy for setting \u03bbmax is not entirely speci\ufb01ed). Assuming \u201cbest single\u201d SHE\u00d7 is\nbetter in terms of LMSE, in direct comparison both models are better on 8/16 images. Comparing\nas an \u201cim.opt.\u201d setting, our full model yields lower LMSE and is better on 12/16 images.\n4.5 Visual Comparison\nAdditionally to the quantitative numbers we present some visual comparison in Figure 3, since the\nnumbers not always re\ufb02ect a visually pleasing results. For example note that the method BAS that\neither attributes all variations to shading (r = 1) or to re\ufb02ectance alone (s = 1) already yields a\nLMSE of 36.6, if for every image the optimal choice between the two is made. Numerically this\nis better than [16, 17] and \u201cGray-Ret\u201d with proper model selection. However the results of those\nalgorithms are of course visually more pleasing. We have also tested our method on various other\nreal-world images and results are visually similar to [15, 4]. Due to missing ground truth and lack\nof space we do not show them.\nFigure 3 shows results with various models and settings. The \u201cturtle\u201d example (top three rows)\nshows the effect of the global term. Without the global term (Color Retinex with LOO-CV and\nimage optimal) the result is imperfect. The key problem of Retinex is highlighted in the two zoom-\nin pictures with blue border (second column, left side). The upper one shows the detected edges in\nblack. As expected the Retinex result has discontinuities at these edges, but over-smooths otherwise\n(lower picture). With a global term (remaining three results) the images look visually much better.\n\n7\n\n\fFigure 3: Various results obtained with different methods and settings (more in supplementary ma-\nterial); For each result: left re\ufb02ectance image, right shading image\n\nNote that the third row shows an extreme variation for the full model when switching from image\noptimal setting to LOO-CV setting. The example \u201cteabag2\u201d illustrates nicely the point that Color\nRetinex and our model without edge term (i.e. no Retinex term) achieve very complementary results.\nOur model without edges is sensitive to edge transitions, while Color Retinex has problems with \ufb01ne\ndetails, e.g. the small text below \u201cTWININGS\u201d. Combing all terms (full model) gives the best result\nwith lowest LMSE score (16.4). Note, in this case we chose for both methods the image optimal\nsettings to illustrate the potential of each model.\n\n5 Discussion and Conclusion\n\nWe have introduced a new probabilistic model for intrinsic images that explicitly models the re-\n\ufb02ectance formation process. Several extensions are conceivable, e.g. one can relax the condition\nI = sR to allow deviations. Another re\ufb01nement would be to replace the Gaussian cluster term\nwith a color line term [12]. Building on the work of [5, 4] one can investigate various higher-order\n(patch-based) priors for both re\ufb02ectance and shading.\nA main concern is that in order to develop more advanced methods a larger and even more diverse\ndatabase than the one of [7] is needed. This is especially true to enable learning of richer models\nsuch as Fields of Experts [13] or Gaussian CRFs [18]. We acknowledge the complexity of collecting\nground truth data, but do believe that the creation of a new, much enlarged dataset, is a necessity for\nfuture progress in this \ufb01eld.\n\n8\n\n\fReferences\n[1] www.gatsby.ucl.ac.uk/\u02dcedward/code/minimize.\n[2] H. G. Barrow and J. M. Tenenbaum. Recovering intrinsic scene characteristics from images. Computer\n\nVision Systems, 1978.\n\n[3] A. Blake. Boundary conditions for lightness computation in mondrian world. Computer Vision, Graphics,\n\nand Image Processing, 1985.\n\n[4] A. Bousseau, S. Paris, and F. Durand. User assisted intrinsic images. SIGGRAPH Asia, 2009.\n[5] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael. Learning low-level vision. International Journal of\n\nComputer Vision (IJCV), 2000.\n\n[6] B. V. Funt, M. S. Drew, and M. Brockington. Recovering shading from color images.\n\nConference on Computer Vision (ECCV), 1992.\n\nIn European\n\n[7] R. Grosse, M. K. Johnson, E. H. Adelson, and W. T. Freeman. Ground-truth dataset and baseline evalua-\n\ntions for intrinsic image algorithms. In International Conference on Computer Vision (ICCV), 2009.\n\n[8] B. K. Horn. Robot Vision. MIT press, 1986.\n[9] E. Land and J. McCann. Lightness and retinex theory. Journal of the Optical Society of America, 1971.\n[10] A. Levin, D. Lischinski, and Y. Weiss. A closed form solution to natural image matting. IEEE Transac-\n\ntions on Pattern Analysis and Machine Intelligence (PAMI), 30(2), 2008.\n\n[11] Y.-H. W. Ming Shao. Recovering facial intrinsic images from a single input. Lecture Notes in Computer\n\nScience, 2009.\n\n[12] I. Omer and M. Werman. Color lines: Image speci\ufb01c color representation.\n\nComputer Vision and Pattern Recognition (CVPR), 2004.\n\nIn IEEE Conference on\n\n[13] S. Roth and M. J. Black. Fields of experts. International Journal of Computer Vision (IJCV), 82(2):205\u2013\n\n229, 2009.\n\n[14] L. Shen, P. Tan, and S. Lin. Intrinsic image decomposition with non-local texture cues. In IEEE Confer-\n\nence on Computer Vision and Pattern Recognition (CVPR), 2008.\n\n[15] L. Shen and C. Yeo. Intrinsic images decomposition using a local and global sparse representation of\n\nre\ufb02ectance. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.\n\n[16] M. Tappen, E. Adelson, and W. Freeman. Estimating intrinsic component images using non-linear regres-\n\nsion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006.\n\n[17] M. Tappen, W. Freeman, and E. Adelson. Recovering intrinsic images from a single image. IEEE Trans-\n\nactions on Pattern Analysis and Machine Intelligence (PAMI), 2005.\n\n[18] M. Tappen, C. Liu, E. H. Adelson, and W. T.Freeman. Learning gaussian conditional random \ufb01elds for\n\nlow-level vision. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007.\n\n[19] Y. Weiss. Deriving intrinsic images from image sequences. In International Conference on Computer\n\nVision (ICCV), 2001.\n\n9\n\n\f", "award": [], "sourceid": 515, "authors": [{"given_name": "Carsten", "family_name": "Rother", "institution": null}, {"given_name": "Martin", "family_name": "Kiefel", "institution": null}, {"given_name": "Lumin", "family_name": "Zhang", "institution": null}, {"given_name": "Bernhard", "family_name": "Sch\u00f6lkopf", "institution": null}, {"given_name": "Peter", "family_name": "Gehler", "institution": null}]}