{"title": "Implicit Wiener Series for Higher-Order Image Analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 465, "page_last": 472, "abstract": null, "full_text": " Implicit Wiener Series for Higher-Order Image\n                                          Analysis\n\n\n\n                          Matthias O. Franz          Bernhard Sch olkopf\n                          Max-Planck-Institut f ur biologische Kybernetik\n                          Spemannstr. 38, D-72076 T ubingen, Germany\n                                mof;bs@tuebingen.mpg.de\n\n\n\n\n                                            Abstract\n\n         The computation of classical higher-order statistics such as higher-order\n         moments or spectra is difficult for images due to the huge number of\n         terms to be estimated and interpreted. We propose an alternative ap-\n         proach in which multiplicative pixel interactions are described by a se-\n         ries of Wiener functionals. Since the functionals are estimated implicitly\n         via polynomial kernels, the combinatorial explosion associated with the\n         classical higher-order statistics is avoided. First results show that image\n         structures such as lines or corners can be predicted correctly, and that\n         pixel interactions up to the order of five play an important role in natural\n         images.\n\n\nMost of the interesting structure in a natural image is characterized by its higher-order\nstatistics. Arbitrarily oriented lines and edges, for instance, cannot be described by the\nusual pairwise statistics such as the power spectrum or the autocorrelation function: From\nknowing the intensity of one point on a line alone, we cannot predict its neighbouring\nintensities. This would require knowledge of a second point on the line, i.e., we have\nto consider some third-order statistics which describe the interactions between triplets of\npoints. Analogously, the prediction of a corner neighbourhood needs at least fourth-order\nstatistics, and so on.\n\nIn terms of Fourier analysis, higher-order image structures such as edges or corners are\ndescribed by phase alignments, i.e. phase correlations between several Fourier components\nof the image. Classically, harmonic phase interactions are measured by higher-order spectra\n[4]. Unfortunately, the estimation of these spectra for high-dimensional signals such as\nimages involves the estimation and interpretation of a huge number of terms. For instance, a\nsixth-order spectrum of a 1616 sized image contains roughly 1012 coefficients, about 1010\nof which would have to be estimated independently if all symmetries in the spectrum are\nconsidered. First attempts at estimating the higher-order structure of natural images were\ntherefore restricted to global measures such as skewness or kurtosis [8], or to submanifolds\nof fourth-order spectra [9].\n\nHere, we propose an alternative approach that models the interactions of image points\nin a series of Wiener functionals. A Wiener functional of order n captures those image\ncomponents that can be predicted from the multiplicative interaction of n image points. In\ncontrast to higher-order spectra or moments, the estimation of a Wiener model does not\nrequire the estimation of an excessive number of terms since it can be computed implicitly\n\n\f\nvia polynomial kernels. This allows us to decompose an image into components that are\ncharacterized by interactions of a given order.\n\nIn the next section, we introduce the Wiener expansion and discuss its capability of model-\ning higher-order pixel interactions. The implicit estimation method is described in Sect. 2,\nfollowed by some examples of use in Sect. 3. We conclude in Sect. 4 by briefly discussing\nthe results and possible improvements.\n\n\n1       Modeling pixel interactions with Wiener functionals\n\nFor our analysis, we adopt a prediction framework: Given a d  d neighbourhood of an\nimage pixel, we want to predict its gray value from the gray values of the neighbours. We\nare particularly interested to which extent interactions of different orders contribute to the\noverall prediction. Our basic assumption is that the dependency of the central pixel value\ny on its neighbours xi, i = 1, . . . , m = d2 - 1 can be modeled as a series\n\n                         y = H0[x] + H1[x] + H2[x] +    + Hn[x] +                                                          (1)\n\nof discrete Volterra functionals\n\n                                                         m                     m\n         H0[x] = h0 = const. and Hn[x] =                                            h(n)                 x . . . x .          (2)\n                                                                                        i                    i         i\n                                                         i                                   1 ...i               1\n                                                                                                       n                    n\n                                                              1 =1             i =1\n                                                                               n\n\n\nHere, we have stacked the grayvalues of the neighbourhood into the vector x =\n(x1, . . . , xm)  Rm. The discrete nth-order Volterra functional is, accordingly, a linear\ncombination of all ordered nth-order monomials of the elements of x with mn coefficients\nh(n)        . Volterra functionals provide a controlled way of introducing multiplicative inter-\n i1...in\nactions of image points since a functional of order n contains all products of the input of\norder n. In terms of higher-order statistics, this means that we can control the order of the\nstatistics used since an nth-order Volterra series leads to dependencies between maximally\nn + 1 pixels.\n\nUnfortunately, Volterra functionals are not orthogonal to each other, i.e., depending on the\ninput distribution, a functional of order n generally leads to additional lower-order interac-\ntions. As a result, the output of the functional will contain components that are proportional\nto that of some lower-order monomials. For instance, the output of a second-order Volterra\nfunctional for Gaussian input generally has a mean different from zero [5]. If one wants\nto estimate the zeroeth-order component of an image (i.e., the constant component created\nwithout pixel interactions) the constant component created by the second-order interactions\nneeds to be subtracted. For general Volterra series, this correction can be achieved by de-\ncomposing it into a new series y = G0[x] + G1[x] +    + Gn[x] +    of functionals\nGn[x] that are uncorrelated, i.e., orthogonal with respect to the input. The resulting Wiener\nfunctionals1 Gn[x] are linear combinations of Volterra functionals up to order n. They\nare computed from the original Volterra series by a procedure akin to Gram-Schmidt or-\nthogonalization [5]. It can be shown that any Wiener expansion of finite degree minimizes\nthe mean squared error between the true system output and its Volterra series model [5].\nThe orthogonality condition ensures that a Wiener functional of order n captures only the\ncomponent of the image created by the multiplicative interaction of n pixels. In contrast to\ngeneral Volterra functionals, a Wiener functional is orthogonal to all monomials of lower\norder [5].\n\nSo far, we have not gained anything compared to classical estimation of higher-order mo-\nments or spectra: an nth-order Volterra functional contains the same number of terms as\n\n     1Strictly speaking, the term Wiener functional is reserved for orthogonal Volterra functionals with\nrespect to Gaussian input. Here, the term will be used for orthogonalized Volterra functionals with\narbitrary input distributions.\n\n\f\nthe corresponding n + 1-order spectrum, and a Wiener functional of the same order has an\neven higher number of coefficients as it consists also of lower-order Volterra functionals.\nIn the next section, we will introduce an implicit representation of the Wiener series using\npolynomial kernels which allows for an efficient computation of the Wiener functionals.\n\n\n2     Estimating Wiener series by regression in RKHS\n\nVolterra series as linear functionals in RKHS.                 The nth-order Volterra functional is\na weighted sum of all nth-order monomials of the input vector x. We can interpret the\nevaluation of this functional for a given input x as a map n defined for n = 0, 1, 2, . . . as\n\n             0(x) = 1 and n(x) = (xn1, xn-1\n                                                     1      x2, . . . , x1xn-1\n                                                                         2        , xn\n                                                                                    2 , . . . , xn )\n                                                                                                  m             (3)\n\nsuch that n maps the input x  Rm into a vector n(x)  Fn = Rmn containing all mn\nordered monomials of degree n. Using n, we can write the nth-order Volterra functional\nin Eq. (2) as a scalar product in Fn,\n\n                                        Hn[x] =  \n                                                     n     n(x),                                                (4)\n\nwith the coefficients stacked into the vector n = (h(n)\n                                                                 1,1,..1, h(n)\n                                                                          1,2,..1, h(n)\n                                                                                          1,3,..1, . . . )     Fn.\nThe same idea can be applied to the entire pth-order Volterra series. By stacking the maps\nn into a single map (p)(x) = (0(x), 1(x), . . . , p(x)) , one obtains a mapping from\nRm into F(p) = R  Rm  Rm2  . . . Rmp = RM with dimensionality M = 1-mp+1 . The\n                                                                                                       1-m\nentire pth-order Volterra series can be written as a scalar product in F(p)\n\n                                   p\n                                          Hn[x] = ((p)) (p)(x)                                                (5)\n                                   n=0\n\nwith (p)  F(p). Below, we will show how we can express (p) as an expansion in terms\nof the training points. This will dramatically reduce the number of parameters we have to\nestimate.\n\nThis procedure works because the space Fn of nth-order monomials has a very special\nproperty: it has the structure of a reproducing kernel Hilbert space (RKHS). As a conse-\nquence, the dot product in Fn can be computed by evaluating a positive definite kernel\nfunction kn(x1, x2). For monomials, one can easily show that (e.g., [6])\n\n                         n(x1) n(x2) = (x1 x2)n =: kn(x1, x2).                                                (6)\n\nSince F(p) is generated as a direct sum of the single spaces Fn, the associated scalar product\nis simply the sum of the scalar products in the Fn:\n\n                                                p\n                   (p)(x1) (p)(x2) =                    (x1 x2)n = k(p)(x1, x2).                              (7)\n                                                n=0\n\nThus, we have shown that the discretized Volterra series can be expressed as a linear func-\ntional in a RKHS2.\n\n\nLinear regression in RKHS.         For our prediction problem (1), the RKHS property of the\nVolterra series leads to an efficient solution which is in part due to the so called repre-\nsenter theorem (e.g., [6]). It states the following: suppose we are given N observations\n\n     2A similar approach has been taken by [1] using the inhomogeneous polynomial kernel\nk(p) (\n inh x1, x2) = (1 + x1 x2)p. This kernel implies a map inh into the same space of monomi-\nals, but it weights the degrees of the monomials differently as can be seen by applying the binomial\ntheorem.\n\n\f\n(x1, y1), . . . , (xN , yN ) of the function (1) and an arbitrary cost function c,  is a nonde-\ncreasing function on R>0 and . F is the norm of the RKHS associated with the kernel k.\nIf we minimize an objective function\n\n                         c((x1, y1, f (x1)), . . . , (xN , yN , f (xN ))) + ( f F),                                     (8)\n\nover all functions in the RKHS, then an optimal solution3 can be expressed as\n\n                                                      N\n                                  f (x) =                    ajk(x, xj),              aj  R.                            (9)\n                                                      j=1\n\nIn other words, although we optimized over the entire RKHS including functions which\nare defined for arbitrary input points, it turns out that we can always express the solution\nin terms of the observations xj only. Hence the optimization problem over the extremely\nlarge number of coefficients (p) in Eq. (5) is transformed into one over N variables aj.\n\nLet us consider the special case where the cost function is the mean squared error,\nc((                                                                          N\n       x1, y1, f (x1)), . . . , (xN , yN , f (xN ))) = 1                            (f (x\n                                                                   N         j=1             j ) - yj )2, and the regularizer\n is zero4. The solution for a = (a1, . . . , aN ) is readily computed by setting the derivative\nof (8) with respect to the vector a equal to zero; it takes the form a = K -1y with the Gram\nmatrix defined as Kij = k(xi, xj), hence5\n\n                                   y = f (x) = a z(x) = y K-1z(x),                                                      (10)\n\nwhere z(x) = (k(x, x1), k(x, x2), . . . k(x, xN ))  RN .\n\nImplicit Wiener series estimation.                     As we stated above, the pth-degree Wiener expan-\nsion is the pth-order Volterra series that minimizes the squared error. This can be put into\nthe regression framework: since any finite Volterra series can be represented as a linear\nfunctional in the corresponding RKHS, we can find the pth-order Volterra series that min-\nimizes the squared error by linear regression. This, by definition, must be the pth-degree\nWiener series since no other Volterra series has this property6. From Eqn. (10), we obtain\nthe following expressions for the implicit Wiener series\n                          1                    p                             p\n              G0[x] =          y 1,                   G                             H\n                         N                                 n[x] =                        n[x] = y    K-1\n                                                                                                        p    z(p)(x)    (11)\n                                               n=0                           n=0\n\nwhere the Gram matrix Kp and the coefficient vector z(p)(x) are computed using the kernel\nfrom Eq. (7) and 1 = (1, 1, . . . )                  RN . Note that the Wiener series is represented only\nimplicitly since we are using the RKHS representation as a sum of scalar products with the\ntraining points. Thus, we can avoid the \"curse of dimensionality\", i.e., there is no need to\ncompute the possibly large number of coefficients explicitly.\n\nThe explicit Volterra and Wiener expansions can be recovered from Eq. (11) by collecting\nall terms containing monomials of the desired order and summing them up. The individual\nnth-order Volterra functionals in a Wiener series of degree p > 0 are given implicitly by\n\n                                               Hn[x] = y K-1\n                                                                        p    zn(x)                                      (12)\n\nwith zn(x) = ((x1 x)n, (x2 x)n, . . . , (x x)n) . For p = 0 the only term is the\n                                                              N\nconstant zero-order Volterra functional H0[x] = G0[x]. The coefficient vector n =\n(h(n)\n  1,1,...1, h(n)\n                1,2,...1, h(n)\n                          1,3,...1, . . . )    of the explicit Volterra functional is obtained as\n\n                                                     n =  K-1\n                                                                n       p    y                                          (13)\n\n       3for conditions on uniqueness of the solution, see [6]\n       4Note that this is different from the regularized approach used by [1]. If  is not zero, the resulting\nVolterra series are different from the Wiener series since they are not orthogonal with respect to the\ninput.\n       5If K is not invertible, K-1 denotes the pseudo-inverse of K.\n       6assuming symmetrized Volterra kernels which can be obtained from any Volterra expanson.\n\n\f\nusing the design matrix n = (n(x1) , n(x1) , . . . , n(x1) ) . The individual\nWiener functionals can only be recovered by applying the regression procedure twice. If\nwe are interested in the nth-degree Wiener functional, we have to compute the solution\nfor the kernels k(n)(x1, x2) and k(n-1)(x1, x2). The Wiener functional for n > 0 is then\nobtained from the difference of the two results as\n\n                n                         n-1\n     Gn[x] =           Gi[x] -                   Gi[x] = y           K-1\n                                                                       n    z(n)(x) - K-1\n                                                                                              n-1 z(n-1)(x) .    (14)\n                i=0                       i=0\n\nThe corresponding ith-order Volterra functionals of the nth-degree Wiener functional are\ncomputed analogously to Eqns. (12) and (13) [3].\n\n\nOrthogonality.         The resulting Wiener functionals must fulfill the orthogonality condition\nwhich in its strictest form states that a pth-degree Wiener functional must be orthogonal to\nall monomials in the input of lower order. Formally, we will prove the following\n\nTheorem 1 The functionals obtained from Eq. (14) fulfill the orthogonality condition\n\n                                                 E [m(x)Gp[x]] = 0                                               (15)\n\nwhere E denotes the expectation over the input distribution and m(x) an arbitrary ith-\norder monomial with i < p.\n\nWe will show that this a consequence of the least squares fit of any linear expansion in a set\nof basis functions of the form y =                 M      \n                                                   j=1         j j (x). In the case of the Wiener and Volterra\nexpansions, the basis functions j(x) are monomials of the components of x.\n\nWe denote the error of the expansion as e(x) = y -                          M      \n                                                                            j=1         j j (xi). The minimum of the\nexpected quadratic loss L with respect to the expansion coefficient k is given by\n\n                          L             \n                                    =           E e(x) 2 = -2E [\n                                                                          k (x)e(x)] = 0.                      (16)\n                               k         k\n\nThis means that, for an expansion in a set of basis functions minimizing the squared error,\nthe error is orthogonal to all basis functions used in the expansion.\n\nNow let us assume we know the Wiener series expansion (which minimizes the mean\nsquared error) of a system up to degree p - 1. The approximation error is given by the\nsum of the higher-order Wiener functionals e(x) =                                  G\n                                                                             n=p         n[x], so Gp[x] is part of the\nerror. As a consequence of the linearity of the expectation, Eq. (16) implies\n\n                                                                      \n                       E [k(x)Gn[x]] = 0 and                                      E [k(x)Gn[x]] = 0            (17)\n                n=p                                                    n=p+1\n\nfor any k of order less than p. The difference of both equations yields E [k(x)Gp[x]] =\n0, so that Gp[x] must be orthogonal to any of the lower order basis functions, namely to all\nmonomials with order smaller than p.\n\n\n3      Experiments\n\nToy examples. In our first experiment, we check whether our intuitions about higher-order\nstatistics described in the introduction are captured by the proposed method. In particular,\nwe expect that arbitrarily oriented lines can only be predicted using third-order statistics.\nAs a consequence, we should need at least a second-order Wiener functional to predict lines\ncorrectly.\n\nOur first test image (size 80  110, upper row in Fig. 1) contains only lines of varying\norientations. Choosing a 5  5 neighbourhood, we predicted the central pixel using (11).\n\n\f\noriginal image      0th-order         1st-order       1st-order      2nd-order       2nd-order      3rd-order       3rd-order\n                   component/       reconstruction    component    reconstruction    component    reconstruction    component\n                  reconstruction\n\n\n\n\n\n                                    mse = 583.7                    mse = 0.006                      mse = 0\n\n\n\n\n\n                                     mse = 1317                     mse = 37.4                    mse = 0.001\n\n\n\n\n\n                                     mse = 1845                    mse = 334.9                     mse = 19.0\n\n\nFigure 1: Higher-order components of toy images. The image components of different orders are\ncreated by the corresponding Wiener functionals. They are added up to obtain the different orders\nof reconstruction. Note that the constant 0-order component and reconstruction are identical. The\nreconstruction error (mse) is given as the mean squared error between the true grey values of the\nimage and the reconstruction. Although the linear first-order model seems to reconstruct the lines, this\nis actually not true since the linear model just smoothes over the image (note its large reconstruction\nerror). A correct prediction is only obtained by adding a second-order component to the model. The\nthird-order component is only significant at crossings, corners and line endings.\n\n\n\nModels of orders 0 . . . 3 were learned from the image by extracting the maximal training\nset of 76  106 patches of size 5  57. The corresponding image components of order 0 to 3\nwere computed according to (14). Note the different components generated by the Wiener\nfunctionals can also be negative. In Fig. 1, they are scaled to the gray values [0..255]. The\nbehaviour of the models conforms to our intuition: the linear model cannot capture the line\nstructure of the image thus leading to a large reconstruction error which drops to nearly\nzero when a second-order model is used. The additional small correction achieved by the\nthird-order model is mainly due to discretization effects.\n\nSimilar to lines, we expect that we need at least a third-order model to predict crossings\nor corners correctly. This is confirmed by the second and third test image shown in the\ncorresponding row in Fig. 1. Note that the third-order component is only significant at\ncrossings, corners and line endings. The fourth- and fifth-order terms (not shown) have\nonly negligible contributions. The fact that the reconstruction error does not drop to zero\nfor the third image is caused by the line endings which cannot be predicted to a higher\naccuracy than one pixel.\n\nApplication to natural images. Are there further predictable structures in natural images\nthat are not due to lines, crossings or corners? This can be investigated by applying our\nmethod to a set of natural images (an example of size 80  110 is depicted in Fig. 2). Our\n\n   7In contrast to the usual setting in machine learning, training and test set are identical in our\ncase since we are not interested in generalization to other images, but in analyzing the higher-order\ncomponents of the image at hand.\n\n\f\n original image       0th-order         1st-order       1st-order      2nd-order       2nd-order\n                     component/       reconstruction    component    reconstruction    component\n                    reconstruction     mse = 1070                    mse = 957.4\n\n\n\n\n\n      3rd-order       3rd-order         4th-order       4th-order      5th-order       5th-order\n reconstruction      component        reconstruction    component    reconstruction    component\n     mse = 414.6                       mse = 98.5                     mse = 18.5\n\n\n\n\n\n      6th-order       6th-order         7th-order       7th-order      8th-order       8th-order\n reconstruction      component        reconstruction    component    reconstruction    component\n     mse = 4.98                        mse = 1.32                     mse = 0.41\n\n\nFigure 2: Higher-order components and reconstructions of a photograph. Interactions up to the fifth\norder play an important role. Note that significant components become sparser with increasing model\norder.\n\n\n\nresults on a set of 10 natural images of size 50  70 show an an approximately exponential\ndecay of the reconstruction error when more and more higher-order terms are added to\nthe reconstruction (Fig. 3). Interestingly, terms up to order 5 still play a significant role,\nalthough the image regions with a significant component become sparser with increasing\nmodel order (see Fig. 2). Note that the nonlinear terms reduce the reconstruction error to\nalmost 0. This suggests a high degree of higher-order redundancy in natural images that\ncannot be exploited by the usual linear prediction models.\n\n\n4       Conclusion\n\nThe implicit estimation of Wiener functionals via polynomial kernels opens up new pos-\nsibilities for the estimation of higher-order image statistics. Compared to the classical\nmethods such as higher-order spectra, moments or cumulants, our approach avoids the\ncombinatorial explosion caused by the exponential increase of the number of terms to be\nestimated and interpreted. When put into a predictive framework, multiplicative pixel inter-\nactions of different orders are easily visualized and conform to the intuitive notions about\nimage structures such as edges, lines, crossings or corners.\n\nThere is no one-to-one mapping between the classical higher-order statistics and multi-\nplicative pixel interactions. Any nonlinear Wiener functional, for instance, creates infinitely\nmany correlations or cumulants of higher order, and often also of lower order. On the other\n\n\f\n   700                                                     Figure 3: Mean square reconstruction error of\n   600                                                     models of different order for a set of 10 natural\n                                                           images.\n   500\n\n   400\n\n  mse 300\n\n   200\n\n   100\n\n       00    1      2      3    4        5    6       7\n                         model order\n\n\n\nhand, a Wiener functional of order n produces only harmonic phase interactions up to order\nn + 1, but sometimes also of lower orders. Thus, when one analyzes a classical statistic of a\ngiven order, one often cannot determine by which order of pixel interaction it was created.\nIn contrast, our method is able to isolate image components that are created by a single\norder of interaction.\n\nAlthough of preliminary nature, our results on natural images suggest an important role of\nstatistics up to the fifth order. Most of the currently used low-level feature detectors such\nas edge or corner detectors maximally use third-order interactions. The investigation of\nfourth- or higher-order features is a field that might lead to new insights into the nature and\nrole of higher-order image structures.\n\nAs often observed in the literature (e.g. [2][7]), our results seem to confirm that a large\nproportion of the redundancy in natural images is contained in the higher-order pixel in-\nteractions. Before any further conclusions can be drawn, however, our study needs to be\nextended in several directions: 1. A representative image database has to be analyzed. The\nimages must be carefully calibrated since nonlinear statistics can be highly calibration-\nsensitive. In addition, the contribution of image noise has to be investigated. 2. Currently,\nonly images up to 9000 pixels can be analyzed due to the matrix inversion required by\nEq. 11. To accomodate for larger images, our method has to be reformulated in an iterative\nalgorithm. 3. So far, we only considered 5  5-patches. To systematically investigate patch\nsize effects, the analysis has to be conducted in a multi-scale framework.\n\n\nReferences\n\n[1] T. J. Dodd and R. F. Harrison. A new solution to Volterra series estimation. In CD-Rom Proc.\n     2002 IFAC World Congress, 2002.\n\n[2] D. J. Field. What is the goal of sensory coding? Neural Computation, 6:559  601, 1994.\n\n[3] M. O. Franz and B. Sch \n                                     olkopf. Implicit Wiener series. Technical Report 114, Max-Planck-\n     Institut f\n               ur biologische Kybernetik, T \n                                                   ubingen, June 2003.\n\n[4] C. L. Nikias and A. P. Petropulu. Higher-order spectra analysis. Prentice Hall, Englewood\n     Cliffs, NJ, 1993.\n\n[5] M. Schetzen. The Volterra and Wiener theories of nonlinear systems. Krieger, Malabar, 1989.\n\n[6] B. Sch\n             olkopf and A. J. Smola. Learning with kernels. MIT Press, Cambridge, MA, 2002.\n\n[7] O. Schwartz and E. P. Simoncelli. Natural signal statistics and sensory gain control. Nature\n     Neurosc., 4(8):819  825, 2001.\n\n[8] M. G. A. Thomson. Higher-order structure in natural scenes. J. Opt.Soc. Am. A, 16(7):1549 \n     1553, 1999.\n\n[9] M. G. A. Thomson. Beats, kurtosis and visual coding. Network: Compt. Neural Syst., 12:271 \n     287, 2001.\n\n\f\n", "award": [], "sourceid": 2611, "authors": [{"given_name": "Matthias", "family_name": "Franz", "institution": null}, {"given_name": "Bernhard", "family_name": "Sch\u00f6lkopf", "institution": null}]}