{"title": "Modeling Nonlinear Dependencies in Natural Images using Mixture of Laplacian Distribution", "book": "Advances in Neural Information Processing Systems", "page_first": 1041, "page_last": 1048, "abstract": null, "full_text": "Modeling Nonlinear Dependencies in \n\nNatural Images using Mixture of \n\nLaplacian Distribution \n\nInstitute for Neural Computation, UCSD \n\n9500 Gilman Drive, La Jolla, CA 92093-0523 \n\nHyun Jin Park and Te Won Lee \n\n{hjinpark, tewon}@ucsd.edu \n\nAbstract \n\nCapturing dependencies in images in an unsupervised manner is \nimportant for many image processing applications. We propose a \nnew method for capturing nonlinear dependencies in images of \nnatural scenes. This method is an extension of the linear Independent \nComponent Analysis (ICA) method by building a hierarchical model \nbased on ICA and mixture of Laplacian distribution. The model \nparameters are learned via an EM algorithm and it can accurately \ncapture variance correlation and other high order structures in a \nsimple manner. We visualize the learned variance structure and \ndemonstrate applications to image segmentation and denoising. \n\n1 Introduction \n\nUnsupervised learning has become an important tool for understanding biological \ninformation processing and building intelligent signal processing methods. Real \nbiological systems however are much more robust and flexible than current artificial \nintelligence mostly due to a much more efficient representations used in biological \nsystems. Therefore, unsupervised learning algorithms that capture more sophisticated \nrepresentations can provide a better understanding of neural information processing \nand also provide improved algorithm for signal processing applications. For example, \nindependent component analysis (ICA) can learn representations similar to simple cell \nreceptive fields in visual cortex [1] and is also applied for feature extraction, image \nsegmentation and denoising [2,3]. ICA can approximate statistics of natural image \npatches by Eq.(1,2), where X is the data and u is a source signal whose distribution is \na product of sparse distributions like a generalized Laplacian distribution. \n\nX =\n\nAu\n\n \n\n(1) \n\n \n\n \n\nuP \u220f=\n)(\n\niuP\n(\n\n \n\n)\n\n(2) \n\nBut the representation learned by the ICA algorithm is relatively low-level. In \nbiological systems there are more high-level representations such as contours, \ntextures and objects, which are not well represented by the linear ICA model. ICA \nlearns only linear dependency between pixels by finding strongly correlated linear \n\n\f \n\naxis. Therefore, the modeling capability of ICA is quite limited. Previous approaches \nshowed that one can learn more sophisticated high-level representations by capturing \nnonlinear dependencies in a post-processing step after the ICA step [4,5,6,7,8]. \nThe focus of these efforts has centered on variance correlation in natural images. After \nICA, a source signal is not linearly predictable from others. However, given variance \ndependencies, a source signal is still \u2018predictable\u2019 in a nonlinear manner. It is not \npossible to de-correlate this variance dependency using a linear transformation. \nSeveral researchers have proposed extensions to capture the nonlinear dependencies. \nPortilla et al. used Gaussian Scale Mixture (GSM) to model variance dependency in \nwavelet domain. This model can learn variance correlation in source prior and showed \nimprovement in image denoising [4]. But in this model, dependency is defined only \nbetween a subset of wavelet coefficients. Hyvarinen and Hoyer suggested using a \nspecial variance related distribution to model the variance correlated source prior. \nThis model can learn grouping of dependent sources (Subspace ICA) or topographic \narrangements of correlated sources (Topographic ICA) [5,6]. Similarly, Welling et al. \nsuggested a product of expert model where each expert represents a variance \ncorrelated group [7]. The product form of the model enables applications to image \ndenoising. But these models don\u2019t reveal higher-order structures explicitly. \nOur model is motivated by Lewicki and Karklin who proposed a 2-stage model where \nthe 1st stage is an ICA model (Eq. (3)) and the 2nd-stage is a linear generative model \nwhere another source v generates logarithmic variance for the 1st stage (Eq. (4)) [8]. \nThis model captures variance dependency structure explicitly, but treating variance as \nan additional random variable introduces another level of complexity and requires \nseveral approximations. Thus, it is difficult to obtain a simple analytic PDF of source \nsignal u and to apply the model for image processing problems. \nlog[\u03bb\n\n (4) \n\n (3) \n\nBv=]\n\n \n\n \n\n|\n\n)\n\u03bb\n\n=\n\nc\n\nexp\n\nuP\n(\n \n\n(\n\u2212\n\n)q\n\u03bb\n\nu\n\n/\n\nWe propose a hierarchical model based on ICA and a mixture of Laplacian \ndistribution. Our model can be considered as a simplification of model in [8] by \nconstraining v to be 0/1 random vector where only one element can be 1. Our model is \ncomputationally simpler but still can capture variance dependency. Experiments show \nthat our model can reveal higher order structures similar to [8]. In addition, our model \nprovides a simple parametric PDF of variance correlated priors, which is an important \nadvantage for adaptive signal processing. Utilizing this, we demonstrate simple \napplications on image segmentation and image denoising. Our model provides an \nimproved statistic model for natural images and can be used for other applications \nincluding feature extraction, image coding, or learning even higher order structures. \n\n2 Modeling nonlinear dependencies \nWe propose a hierarchical or 2-stage model where the 1st stage is an ICA source signal \nmodel and the 2nd stage is modeled by a mixture model with different variances (figure \n1). In natural images, the correlation of variance reflects different types of regularities \nin the real world. Such specialized regularities can be summarized as \u201ccontext\u201d \ninformation. To model the context dependent variance correlation, we use mixture \nmodels where Laplacian distributions with different variance represent different \ncontexts. For each image patch, a context variable Z \u201cselects\u201d which Laplacian \ndistribution will represent ICA source signal u. Laplacian distributions have 0-mean \n\n\f \n\nbut different variances. The advantage of Laplacian distribution for modeling context \nis that we can model a sparse distribution using only one Laplacian distribution. But \nwe need more than two Gaussian distributions to do the same thing. Also conventional \nICA is a special case of our model with one Laplacian. We define the mixture model \nand its learning algorithm in the next sections. \n\nFigure 1: Proposed hierarchical model (1st stage is ICA generative model. 2nd stage is \nmixture of \u201ccontext dependent\u201d Laplacian distributions which model U. Z is a random \nvariable that selects a Laplacian distribution that generates the given image patch) \n\n2.1 Mixture of Laplacian Distribution \n\nWe define a PDF for mixture of M-dimensional Laplacian Distribution as Eq.(5), \nwhere N is the number of data samples, and K is the number of mixtures. \n\n \n\nUP\n(\n\n|\n\n=\u03a0\u039b\n\n)\n\n,\n\nN\n\n\u220f\n\nn\n\nr\nuP\n(\n\nn\n\n|\n\n=\u03a0\u039b\n\n)\n\n,\n\nN\n\nn\n\nK\n\nk\n\n\u220f\u2211\n\n\u03c0\nk\n\nr\nuP\n(\n\nn\n\nr\n\u03bb\nk\n\n|\n\n)\n\n=\n\n\u220f\u2211 \u220f\n\n\u03c0\nk\n\nN\n\nn\n\nK\n\nk\n\nM\n\nm\n\n1\n(\n2\n\u03bb\nmk\n,\n\n\uf8eb\n\uf8ec\nexp\n\uf8ec\n\uf8ed\n\n)\n\n\u2212\n\nu\nmn\n,\n\u03bb\nmk\n,\n\n \n\n\uf8f6\n\uf8f7\n\uf8f7\n\uf8f8\n\n(5) \n\n : n-th data sample, \n\n)\n\nU\n\nrr=\nuu\n,\n(\n1\n2\n\n,,,\n\nr\nu\ni\n\n,,,\n\nr\nu\n\nN\n\n)\n\n \n\nn\n\nn\n\nMn\n,\n\n1,\n\n,\n\n2,\n\nu\n\nu\n\nu\n(\n\n,,,\n\nu =r\nn\n\u03bb\u03bb\u03bb\u03bb =r\nk\u03c0 \n\n,...,\n\n(\n\n,\n\nk\n\n: Variance of k-th Laplacian distribution, \n\n)\n\nk\n\n1,\n\n2,\n\nMk\n,\n\nk\n : probability of Laplacian distribution k, \n\n \n\nrr=\u039b\n,\n\n2\n\nr\n\n,,,\n\nr\n,,,\n(\n)\nk \u03bb\u03bb\u03bb\u03bb\nK\n1\n1=\u2211k\nk\u03c0 \n\n and \n\nK\u03c0\u03c0=\u03a0\n\n( 1\n\n,,,\n\n)\n\nIt is not easy to maximize Eq.(5) directly, and we use EM (expectation maximization) \nalgorithm for parameter estimation. Here we introduce a new hidden context variable \nZ that represents which Laplacian k, is responsible for a given data point. Assuming \nwe know the hidden variable Z, we can write the likelihood of data and Z as Eq.(6), \n\nZUP\n(\n\n,\n\n|\n\n=\u03a0\u039b\n\n)\n\n,\n\nN\n\n\u220f\n\nn\n\nr\n\nZuP\n(\n\n,\n\nn\n\n|\n\n=\u03a0\u039b\n\n)\n\n,\n\nN\n\nn\n\n\u220f \u220f\n\nK\n\nk\n\n\uf8ee\n\uf8ef\n\uf8ef\n\uf8ef\n\uf8f0\n\n\uf8ee\n(\n\uf8ef\n\u03c0\nk\n\uf8ef\n\uf8ef\n\uf8f0\n\nz\n\nn\nk\n\n)\n\n\u220f\n\nm\n\n\uf8eb\n\uf8ec\n\uf8ec\uf8ec\n\uf8ed\n\n\uf8eb\n\uf8ec\n\uf8ec\n\uf8ed\n\nz\n\nn\nk\n\n1\n2\n\u03bb\nmk\n,\n\n\uf8f6\n\uf8f7\n\uf8f7\n\uf8f8\n\n\u22c5\n\n\uf8eb\n\uf8ec\nexp\n\uf8ec\n\uf8ed\n\n\u2212\n\nz\n\nn\nk\n\nu\nmn\n,\n\u03bb\nmk\n,\n\n \n\n\uf8f6\n\uf8f7\n\uf8f7\n\uf8f8\n\n\uf8f6\n\uf8f7\n\uf8f7\uf8f7\n\uf8f8\n\n\uf8f9\n\uf8fa\n\uf8fa\n\uf8fa\n\uf8fb\n\n\uf8f9\n\uf8fa\n\uf8fa\n\uf8fa\n\uf8fb\n\n (6) \n\nn\n\nkz : Hidden binary random variable, 1 if n-th data sample is generated from k-th \nLaplacian, 0 other wise. (\n\n for all n = 1\u2026N) \n\n and \n\nZ =\n\n(\n)n\nkz\n\n1=\u2211\n\nkz\nn\n\nk\n\n2.2 EM algorithm for learning the mixture model \n\nThe EM algorithm maximizes the log likelihood of data averaged over hidden variable \nZ. The log likelihood and its expectation can be computed as in Eq.(7,8). \n\n\flog\n\nZUP\n,\n(\n\n|\n\n=\u03a0\u039b\n\n)\n\n,\n\n\u2211\n\nkn\n,\n\n\uf8ee\n\uf8ef\n\uf8ef\n\uf8f0\n\nz\n\nn\nk\n\nlog(\n\n\u03c0\nk\n\n)\n\n+\n\nz\n\nn\nk\n\n\u2211\n\nm\n\n\uf8eb\n\uf8ec\n\uf8ec\n\uf8ed\n\nlog(\n\n1\n2\n\u03bb\nmk\n,\n\n)\n\n\u2212\n\nu\nmn\n,\n\u03bb\nmk\n,\n\n \n\n\uf8f6\n\uf8f7\n\uf8f7\n\uf8f8\n\n\uf8f9\n\uf8fa\n\uf8fa\n\uf8fb\n\nE\n\n{\nlog\n\nZUP\n(\n\n,\n\n|\n\n}\n)\n=\u03a0\u039b\n\n,\n\n\uf8ee\n{ }\u2211\nzE\n\uf8ef\n\uf8ef\n\uf8f0\n\nkn\n,\n\nn\nk\n\nlog(\n\n\u03c0\nk\n\n)\n\n+\n\n\u2211\n\nm\n\n\uf8eb\n\uf8ec\n\uf8ec\n\uf8ed\n\nlog(\n\n1\n2\n\u03bb\nmk\n,\n\n)\n\n\u2212\n\nu\nmn\n,\n\u03bb\nmk\n,\n\n \n\n\uf8f6\n\uf8f7\n\uf8f7\n\uf8f8\n\n\uf8f9\n\uf8fa\n\uf8fa\n\uf8fb\n\n \n\n(7) \n\n (8) \n\nThe expectation in Eq.(8) can be evaluated, if we are given the data U and estimated \nparameters \u039b and \u03a0. For \u039b and \u03a0, EM algorithm uses current estimation \u039b\u2019 and \u03a0\u2019. \n{ }\nzE\n\n}\n'\n=\u03a0\u039b\n\n{\nUzE\n\n=\u03a0\u039b\n\n\u03a0\u039b\n\nzPz\n(\nn\nk\n\nzP\n(\n\n|1\n\n)'\n\n)'\n\n,'\n\n=\n\n\u2261\n\nu\n\nu\n\n,'\n\n,'\n\nn\nk\n\nn\nk\n\nn\nk\n\nn\nk\n\n,\n\n,\n\n,\n\n|\n\n|\n\nn\n\nn\n\n1\n\n)'\n\n \n\n(9) \n\n\u2211\n\nn\nk\n\n0\n=\n\u03a0\u039b=\n\n|1\n\n,'\n\nuP\n(\n\nn\n\n|\n\nz\n\nn\nk\n\n\u03a0\u039b=\n|\n\n,'\n,1\nuP\n(\nM\n\nn\n\n1\n\u03a0\u039b\n\n,'\n\n)'\n\n\u220f\n\nm\n\nz\nzP\nn\n(\n)'\nk\n)'\n,'\n\u03a0\u039b\n1\n2\n\u03bb\nmk\n,\n\n'\n\n=\n\n=\n\nuP\n(\n\nn\n\n|\n\nexp(\n\n\u2212\n\nmn\n,\n\nu\n\u03bb\nmk\n,\n\n'\n\n)\n\n\u22c5\n\n\u03c0\nk\n\n'\n\n=\n\n1\nc\n\nn\n\nM\n\n\u220f\n\nm\n\n'\n\n\u03c0\nk\n2\n\u03bb\nmk\n,\n\n'\n\nexp(\n\n\u2212\n\nmn\n,\n\nu\n\u03bb\nmk\n,\n\n'\n\n)\n\nWhere the normalization constant can be computed as \n\nc\n\nn\n\n=\n\nuP\n(\n\nn\n\n|\n\n=\u03a0\u039b\n\n)'\n\n,'\n\nK\n\n\u2211\n\nk\n\nuP\n(\n\nn\n\n|\n\nz\n\nn\nk\n\n,\n\n\u03a0\u039b\n\n,'\n\n)'\n\nzP\n(\n\nn\nk\n\n|\n\n=\u03a0\u039b\n\n)'\n\n,'\n\nK\n\nM\n\n\u2211 \u220f\n\n\u03c0\nk\n\nm\n\n1\n=\n\nk\n\n1\n=\n\n1\n(\n2\n\u03bb\nmk\n\n,\n\n)\n\nexp(\n\n\u2212\n\n \n\n)\n\nu\nmn\n,\n\u03bb\nmk\n\n,\n\n(10) \n\nThe EM algorithm works by maximizing Eq.(8), given the expectation computed from \nEq.(9,10). Eq.(9,10) can be computed using \u039b\u2019 and \u03a0\u2019 estimated in the previous \niteration of EM algorithm. This is E-step of EM algorithm. Then in M-step of EM \nalgorithm, we need to maximize Eq.(8) over parameter \u039b and \u03a0. \nFirst, we can maximize Eq.(8) with respect to \u039b, by setting the derivative as 0. \n\nE\n\u2202\n\n{\nlog\n\nZUP\n(\n,\n\u2202\n\u03bb\n\nmk\n,\n\n|\n\n\u03a0\u039b\n\n,\n\n}\n)\n\n=\n\n\u2211\n\nn\n\n\uf8ee\n\uf8eb\n{ }\n\uf8ec\nzE\n\uf8ef\n\uf8ec\n\uf8ef\n\uf8ed\n\uf8f0\n\nn\nk\n\n\u2212\n\n1\n\u03bb\nmk\n,\n\n+\n\nu\n(\n\u03bb\nmk\n,\n\nmn\n,\n)\n\n2\n\n\uf8f6\n\uf8f7\n\uf8f7\n\uf8f8\n\n\uf8f9\n\uf8fa\n\uf8fa\n\uf8fb\n\n=\n\n0\n\n\u21d2\n\n\u03bb\nmk\n,\n\n=\n\n\u2211\n\nn\n\nmn\n,\n\n{ }\nu\nzE\nn\n\u22c5\nk\n{ }\u2211\nzE\n\nn\nk\n\nn\n\n \n\n(11) \n\nSecond, for maximization of Eq.(8) with respect to \u03a0, we can rewrite Eq.(8) as below. \n(12) \nE\n\nZUP\n(\n\n\u03c0 \n)\n\n}\n)\n=\u03a0\u039b\n\n{\nlog\n\nlog(\n\n{ }\nkzE\nn\n\nC\n\n,\n\n,\n\n|\n\nk\n\n'\n\n'\n\n\u2211+\n\nkn\n,\n\n'\n\nAs we see, the derivative of Eq.(12) with respect to \u03a0 cannot be 0. Instead, we need to \nuse Lagrange multiplier method for maximization. A Lagrange function can be \ndefined as Eq.(14) where \u03c1 is a Lagrange multiplier. \n\nL\n\n(\n\n\u03a0\n\n,\n\n)\n\u03c1\n\n\u2212=\n\n\u2211\n\nkn\n,\n\n'\n\n{ }\nkzE\nn\n'\n\nlog(\n\n\u03c0\u03c1\u03c0\nk\n\n+\n\nk\n\n'\n\n(\n\n)\n\n \n\n\u2212\n\n)1\n\n'\n\n(13) \n\n\u2211\n\nk\n\n'\n\nBy setting the derivative of Eq.(13) to be 0 with respect to \u03c1 and \u03a0, we can simply get \nthe maximization solution with respect to \u03a0. We just show the solution in Eq.(14). \nL\n,\n(\n\u03a0\u2202\n\u03c1\n\u2202\n\nL\n,\n(\n\u03a0\u2202\n\u03a0\u2202\n\n\u2211\u2211\n\n=\u21d2\n\n(14) \n\n \n0\n\n\u2211\n\n\u03c0\nk\n\n)\n\u03c1\n\n)\n\u03c1\n\n,0\n\n=\n\n=\n\n \n\n/\n\n\uf8f6\n{ }\uf8f7\nzE\n\uf8f8\n\nn\nk\n\n\uf8f6\n{ }\nzE\n\uf8f7\n\uf8f8\n\nn\nk\n\n\uf8eb\n\uf8ec\n\uf8ed\n\n\uf8eb\n\uf8ec\n\uf8ed\n\nk\n\nn\n\nn\n\nThen the EM algorithm can be summarized as figure 2. For the convergence criteria, \nwe can use the expectation of log likelihood, which can be calculated from Eq. (8). \n\n\f \n\n (e is small random noise) \n\n,\n\nk\n\nn\nk\n\n|\n\n,\n\nn\nk\n\n\u2261\n\n=,\u03bb\nmk\n\n=\u03a0\u039b\n\n,'\n\n{\nuE m\n\n1=\u03c0\nK\n\n{\nUzE\n\n} \u220f\n'\n\n1. Initialize \n} e\n+\n2. Calculate the Expectation by \nu\n{ }\nzE\n\u03bb\nmk\n,\n3. Maximize the log likelihood given the Expectation \n,\u03bb\nmk\n4. If (converged) stop, otherwise repeat from step 2. \nFigure 2: Outline of EM algorithm for Learning the Mixture Model \n\n\uf8f6\n{ }\uf8f7\nzE\n\uf8f8\n\n\uf8f6\n{ }\uf8f7\nzE\n\uf8f8\n\n\uf8f6\n{ }\nzE\n\uf8f7\n\uf8f8\n\n\u03c0\nk\n2\n\u03bb\nmk\n,\n\n{ }\nzE\n\u22c5\n\nn\nk\n\n\u2211\u2211\n\n, \n\nexp(\n\n\u2212\n\n\u2211\n\nn\n\n\u2211\n\nn\n\n\u2211\n\nn\n\n1\nc\n\nn\n\n\uf8f6\n\uf8f7\n\uf8f8\n\n/\n\n\uf8eb\n\uf8ec\n\uf8ed\n\n\u03c0\nk\n\n\u2190\n\n/\n\n\uf8eb\n\uf8ec\n\uf8ed\n\n)\n\n'\n\nM\n\nm\n\nu\n\nmn\n,\n\n'\n\n'\n\n\u2190\n\n\uf8eb\n\uf8ec\n\uf8ed\n\nmn\n,\n\n \n\n \n\nn\nk\n\nn\nk\n\n\uf8eb\n\uf8ec\n\uf8ed\n\nn\nk\n\nk\n\nn\n\n3 Experimental Results \n\nHere we provide examples of image data and show how the learning procedure is \nperformed for the mixture model. We also provide visualization of learned variances \nthat reveal the structure of variance correlation and an application to image denoising. \n\n3.1 Learning Nonlinear Dependencies in Natural images \n\nAs shown in figure 1, the 1st stage of the proposed model is simply the linear ICA. The \nICA matrix A and W(=A-1) are learned by the FastICA algorithm [9]. We sampled \n105(=N) data from 16x16 patches (256 dim.) of natural images and use them for both \nfirst and second stage learning. ICA input dimension is 256, and source dimension is \nset to be 160(=M). The learned ICA basis is partially shown in figure 1. The 2nd stage \nmixture model is learned given the ICA source signals. In the 2nd stage the number of \nmixtures is set to 16, 64, or 256(=K). Training by the EM algorithm is fast and several \nhundred iterations are sufficient for convergence (0.5 hour on a 1.7GHz Pentium PC). \nFor the visualization of learned variance, we adapted the visualization method from \n[8]. Each dimension of ICA source signal corresponds to an ICA basis (columns of A) \nand each ICA basis is localized in both image and frequency space. Then for each \nLaplacian distribution, we can display its variance vector as a set of points in image \nand frequency space. Each point can be color coded by variance value as figure 3. \n\n \n \n\n \n\n \n\n \n\n(a1) \n\n(b1) \n\n (a2) \n\n \nFigure 3: Visualization of learned variances (a1 and a2 visualize variance of \nLaplacian #4 and b1 and 2 show that of Laplacian #5. High variance value is mapped \nto red color and low variance is mapped to blue. In Laplacian #4, variances for \ndiagonally oriented edges are high. But in Laplacian #5, variances for edges at \nspatially right position are high. Variance structures are related to \u201ccontexts\u201d in the \nimage. For example, Laplacian #4 explains image patches that have oriented textures \nor edges. Laplacian #5 captures patches where left side of the patch is clean but right \nside is filled with randomly oriented edges.) \n\n (b2) \n\n\f \n\nA key idea of our model is that we can mix up independent distributions to get non- \nlinearly dependent distribution. This modeling power can be shown by figure 4. \n\n \n\nFigure 4: Joint distribution of nonlinearly dependent sources. ((a) is a joint histogram \nof 2 ICA sources, (b) is computed from learned mixture model, and (c) is from learned \nLaplacian model. In (a), variance of u2 is smaller than u1 at center area (arrow A), but \nalmost equal to u1 at outside (arrow B). So the variance of u2 is dependent on u1. This \nnonlinear dependency is closely approximated by mixture model in (b), but not in (c).) \n\n3.2 Unsupervised Image Segmentation \n\nThe idea behind our model is that the image can be modeled as mixture of different \nvariance correlated \u201ccontexts\u201d. We show how the learned model can be used to \nclassify different context by an unsupervised image segmentation task. Given learned \nmodel and data, we can compute the expectation of a hidden variable Z from Eq. (9). \nThen for an image patch, we can select a Laplacian distribution with highest \nprobability, which is the most explaining Laplacian or \u201ccontext\u201d. For segmentation, \nwe use the model with 16 Laplacians. This enables abstract partitioning of images and \nwe can visualize organization of images more clearly (figure 5). \n\nFigure 5: Unsupervised image segmentation (left is original image, middle is color \nlabeled image, right image shows color coded Laplacians with variance structure. \nEach color corresponds to a Laplacian distribution, which represents surface or \ntextural organization of underlying contexts. Laplacian #14 captures smooth surface \nand Laplacian #9 captures contrast between clear sky and textured ground scenes.) \n\n3.3 Application to Image Restoration \n\nThe proposed mixture model provides a better parametric model of the ICA source \ndistribution and hence an improved model of the image structure. An advantage is in \nthe MAP (maximum a posterior) estimation of a noisy image. If we assume Gaussian \nnoise n, the image generation model can be written as Eq.(15). Then, we can compute \nMAP estimation of ICA source signal u by Eq.(16) and reconstruct the original image. \n\n\f \n\n(15) \n(16) \n\n)\n\n+\n\nlog\n\n))(\nuP\n\n \n\n \n\n)\n\n=\n\nargmax\n\nu\n\n(\nlog\n\nAuXP\n(\n\n,\n\n|\n\nX\n\n=\n\nAu\n\n+\n\nn\n\n \n\n\u02c6\nu\n\n=\n\nargmax\n\nu\n\nlog\n\nAXuP\n\n(\n\n,\n\n|\n\nSince we assumed Gaussian noise, P(X|u,A) in Eq. (16) is Gaussian. P(u) in Eq. (16) \ncan be modeled as a Laplacian or a mixture of Laplacian distribution. The mixture \ndistribution can be approximated by a maximum explaining Laplacian. We evaluated \n3 different methods for image restoration including ICA MAP estimation with simple \nLaplacian prior, same with Laplacian mixture prior, and the Wiener filter. Figure 6 \nshows an example and figure 7 summarizes the results obtained with different noise \nlevels. As shown MAP estimation with the mixture prior performs better than the \nothers in terms of SNR and SSIM (Structural Similarity Measure) [10]. \n\nFigure 6: Image restoration results (signal variance 1.0, noise variance 0.81) \n\n \n\n16\n\n14\n\n12\n\n10\n\n8\n\n6\n\n4\n\nR\nN\nS\n\nICA MAP(Mixture prior)\nICA MAP(Laplacian prior)\nW iener\n\nx\ne\nd\nn\nI\n \n\nI\n\nM\nS\nS\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\nICA MAP(Mixture prior)\nICA MAP(Laplacian prior)\nW iener\nNoisy Image\n\n2\n\n0\n\n0\n\n0\n\n1\n\n0.5\n\n \nFigure 7: SNR and SSIM for 3 different algorithms (signal variance = 1.0) \n\n1.5\nNoise variance\n\n1.5\nNoise variance\n\n2.5\n\n0.5\n\n2.5\n\n2\n\n1\n\n2\n\n4 Discussion \n\nWe proposed a mixture model to learn nonlinear dependencies of ICA source signals \nfor natural images. The proposed mixture of Laplacian distribution model is a \ngeneralization of the conventional independent source priors and can model variance \ndependency given natural image signals. Experiments show that the proposed model \ncan learn the variance correlated signals grouped as different mixtures and learn high- \nlevel structures, which are highly correlated with the underlying physical properties \n\n\f \n\ncaptured in the image. Our model provides an analytic prior of nearly independent and \nvariance-correlated signals, which was not viable in previous models [4,5,6,7,8]. \nThe learned variances of the mixture model show structured localization in image and \nfrequency space, which are similar to the result in [8]. Since the model is given no \ninformation about the spatial location or frequency of the source signals, we can \nassume that the dependency captured by the mixture model reveals regularity in the \nnatural images. As shown in image labeling experiments, such regularities correspond \nto specific surface types (textures) or boundaries between surfaces. \nThe learned mixture model can be used to discover hidden contexts that generated \nsuch regularity or correlated signal groups. Experiments also show that the labeling of \nimage patches is highly correlated with the object surface types shown in the image. \nThe segmentation results show regularity across image space and strong correlation \nwith high-level concepts. \nFinally, we showed applications of the model for image restoration. We compare the \nperformance with the conventional ICA MAP estimation and Wiener filter. Our \nresults suggest that the proposed model outperforms other traditional methods. It is \ndue to the estimation of the correlated variance structure, which provides an improved \nprior that has not been considered in other methods. \nIn our future work, we plan to exploit the regularity of the image segmentation result \nto lean more high-level structures by building additional hierarchies on the current \nmodel. Furthermore, the application to image coding seems promising. \n\nReferences \n[1] A. J. Bell and T. J. Sejnowski, The \u2018Independent Components\u2019 of Natural Scenes are Edge \nFilters, Vision Research, 37(23):3327\u20133338, 1997. \n[2] A. Hyvarinen, Sparse Code Shrinkage: Denoising of Nongaussian Data by Maximum \nLikelihood Estimation,Neural Computation, 11(7):1739-1768, 1999. \n[3] T. Lee, M. Lewicki, and T. Sejnowski., ICA Mixture Models for unsupervised \nClassification of non-gaussian classes and automatic context switching in blind separation. \nPAMI, 22(10), October 2000. \n[4] J. Portilla, V. Strela, M. J. Wainwright and E. P Simoncelli, Image Denoising using Scale \nMixtures of Gaussians in the Wavelet Domain, IEEE Trans. On Image Processing, Vol.12, No. \n11, 1338-1351, 2003. \n[5] A. Hyvarinen, P. O. Hoyer. Emergence of phase and shift invariant features by \ndecomposition of natural images into independent feature subspaces. Neurocomputing, 1999. \n[6] A. Hyvarinen, P.O. Hoyer, Topographic Independent component analysis as a model of V1 \nReceptive Fields, Neurocomputing, Vol. 38-40, June 2001. \n[7] M. Welling and G. E. Hinton, S. Osindero, Learning Sparse Topographic Representations \nwith Products of Student-t Distributions, NIPS, 2002. \n[8] M. S. Lewicki and Y. Karklin, Learning higher-order structures in natural images, Network: \nComput. Neural Syst. 14 (August 2003) 483-499. \n[9] A.Hyvarinen, P.O. Hoyer, Fast ICA matlab code., \nhttp://www.cis.hut.fi/projects/compneuro/extensions.html/ \n[10] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, The SSIM Index for Image \nQuality Assessment, IEEE Transactions on Image Processing, vol. 13, no. 4, Apr. 2004. \n\n\f", "award": [], "sourceid": 2689, "authors": [{"given_name": "Hyun", "family_name": "Park", "institution": null}, {"given_name": "Te", "family_name": "Lee", "institution": null}]}