{"title": "Semi-supervised Regression via Parallel Field Regularization", "book": "Advances in Neural Information Processing Systems", "page_first": 433, "page_last": 441, "abstract": "This paper studies the problem of semi-supervised learning from the vector field perspective. Many of the existing work use the graph Laplacian to ensure the smoothness of the prediction function on the data manifold. However, beyond smoothness, it is suggested by recent theoretical work that we should ensure second order smoothness for achieving faster rates of convergence for semi-supervised regression problems. To achieve this goal, we show that the second order smoothness measures the linearity of the function, and the gradient field of a linear function has to be a parallel vector field. Consequently, we propose to find a function which minimizes the empirical error, and simultaneously requires its gradient field to be as parallel as possible. We give a continuous objective function on the manifold and discuss how to discretize it by using random points. The discretized optimization problem turns out to be a sparse linear system which can be solved very efficiently. The experimental results have demonstrated the effectiveness of our proposed approach.", "full_text": "Semi-supervised Regression via\nParallel Field Regularization\n\nBinbin Lin\n\nState Key Lab of CAD&CG, College of Computer Science, Zhejiang University\n{binbinlinzju, chiyuan.zhang.zju, xiaofeihe}@gmail.com\n\nHangzhou 310058, China\n\nChiyuan Zhang\n\nXiaofei He\n\nAbstract\n\nThis paper studies the problem of semi-supervised learning from the vector \ufb01eld\nperspective. Many of the existing work use the graph Laplacian to ensure the\nsmoothness of the prediction function on the data manifold. However, beyond\nsmoothness, it is suggested by recent theoretical work that we should ensure\nsecond order smoothness for achieving faster rates of convergence for semi-\nsupervised regression problems. To achieve this goal, we show that the second\norder smoothness measures the linearity of the function, and the gradient \ufb01eld of\na linear function has to be a parallel vector \ufb01eld. Consequently, we propose to\n\ufb01nd a function which minimizes the empirical error, and simultaneously requires\nits gradient \ufb01eld to be as parallel as possible. We give a continuous objective\nfunction on the manifold and discuss how to discretize it by using random points.\nThe discretized optimization problem turns out to be a sparse linear system which\ncan be solved very ef\ufb01ciently. The experimental results have demonstrated the\neffectiveness of our proposed approach.\n\n1\n\nIntroduction\n\nIn many machine learning problems, one is often confronted with very high dimensional data. There\nis a strong intuition that the data may have a lower dimensional intrinsic representation. Various\nresearchers have considered the case when the data is sampled from a submanifold embedded in\nthe ambient Euclidean space. Consequently, learning with the low dimensional manifold structure,\nor speci\ufb01cally the intrinsic topological and geometrical properties of the data manifold, becomes a\ncrucial problem.\nIn the past decade, many geometrically motivated approaches have been developed. The early work\nmainly considers the problem of dimensionality reduction. One hopes that the manifold structure\ncould be preserved in the much lower dimensional Euclidean space. For example, ISOMAP [1] is\na global approach which tries to preserve the pairwise geodesic distance on the manifold. Different\nfrom ISOMAP, Hessian Eigenmaps (HLLE, [2]) is a local approach for similar purpose. Locally\nLinear Embedding (LLE, [3]) and Laplacian Eigenmaps (LE, [4]) can be viewed as Laplacian oper-\nator based methods which mainly consider the local neighborhood structure of the manifold.\nBesides dimensionality reduction, Laplacian based regularization has also been widely employed\nin semi-supervised learning. These methods construct a nearest neighbor graph over the labeled\nand unlabeled data to model the underlying manifold structure, and use the graph Laplacian [5]\nto measure the smoothness of the learned function on the manifold. A variety of semi-supervised\nlearning approaches using the graph Laplacian have been proposed [6, 7, 8]. In semi-supervised\nregression, some recent theoretical analysis [9] shows that using the graph Laplacian regularizer\ndoes not lead to faster minimax rates of convergence. [9] also states that the Laplacian regularizer\nis way too general for measuring the smoothness of the function. It is further suggested that we\n\n1\n\n\fshould ensure second order smoothness to achieve faster rates of convergence for semi-supervised\nregression problems. The Laplacian regularizer is the integral on the norm of the gradient of the\nfunction, which is the \ufb01rst order derivative on the function.\nIn this paper, we design regularization terms that penalize the second order smoothness of the func-\ntion, i.e., the linearity of the function. Estimating the second order covariant derivative of the func-\ntion is a very challenging problem. We try to address this problem from vector \ufb01elds perspective.\nWe show that the gradient \ufb01eld of a linear function has to be a parallel vector \ufb01eld (or parallel \ufb01eld\nin short). Consequently, we propose a novel approach called Parallel Field Regularization (PFR)\nto simultaneously \ufb01nd the function and its gradient \ufb01eld, while requiring the gradient \ufb01eld to be as\nparallel as possible. Speci\ufb01cally, we propose to compute a function and a vector \ufb01eld which satisfy\nthree conditions simultaneously: 1) the function minimizes the empirical error on the labeled data,\n2) the vector \ufb01eld is close to the gradient \ufb01eld of the function, 3) the vector \ufb01eld should be as par-\nallel as possible. A novel regularization framework from the vector \ufb01led perspective is developed.\nWe give both the continuous and discrete forms of the objective function, and develop an ef\ufb01cient\noptimization scheme to solve this problem.\n\n2 Regularization on the Vector Field\n\nWe \ufb01rst brie\ufb02y introduce semi-supervised learning methods with regularization on the function.\ni=1 on M, we\nLet M be a d-dimensional submanifold in Rm. Given l labeled data points (xi, yi)l\naim to learn a function f : M \u2192 R based on the manifold M and the labeled points (xi, yi)l\ni=1. A\nframework of semi-supervised learning based on differential operators can be formulated as follows:\n\narg min\nf\u2208C\u221e(M)\n\nE(f ) =\n\n1\nl\n\nR0(f (xi), yi) + \u03bb1R1(f )\n\nwith a differential operator, i.e., R1(f ) = (cid:82)\nthe covariant derivative \u2207 on the manifold, then R1(f ) = (cid:82)\nLaplacian regularizer. If D is the Hessian operator on the manifold, then R1(f ) = (cid:82)\n\nwhere C\u221e(M) denotes smooth functions on M, R0 : R \u00d7 R \u2192 R is the loss function and R1(f ) :\nC\u221e(M) \u2192 R is a regularization functional. R1 is often written as a functional norm associated\nM (cid:107)Df(cid:107)2 where D is a differential operator. If D is\nM f L(f ) becomes the\nM (cid:107)Hessf(cid:107)2\n\nM (cid:107)\u2207f(cid:107)2 = (cid:82)\n\nbecomes the Hessian regularizer.\n\nl(cid:88)\n\ni=1\n\n2.1 Parallel Fields and Linear Functions\n\nWe \ufb01rst show the relationship between a parallel \ufb01eld and a linear function on the manifold.\nDe\ufb01nition 2.1 (Parallel Field [10]). A vector \ufb01eld X on manifold M is a parallel \ufb01eld if\n\n\u2207X \u2261 0,\n\nwhere \u2207 is the covariant derivative on M.\nDe\ufb01nition 2.2 (Linear Function [10]). A continuous function f : M \u2192 R is said to be linear if\n\n(f \u25e6 \u03b3)(t) = f (\u03b3(0)) + ct\n\n(1)\n\nfor each geodesic \u03b3.\n\nA function f is linear means that it varies linearly along the geodesics of the manifold. It is a natural\nextension of linear functions on Euclidean space.\nProposition 2.1. [10] Let V be a parallel \ufb01eld on the manifold. If it is also a gradient \ufb01eld for\nfunction f, V = \u2207f, then f is a linear function on the manifold.\nThis proposition tells us the relationship between a parallel \ufb01eld and a linear function on the mani-\nfold.\n\n2.2 Objective Function\n\nWe aim to design regularization terms that penalize the second order smoothness of the function.\nFollowing the above analysis, we \ufb01rst approximate gradient \ufb01eld of the prediction function by a\n\n2\n\n\fFigure 1: Covariant derivative demonstration. Let\nV, Y be two vector \ufb01elds on manifold M. Given\na point x \u2208 M, we show how to compute the\nvector \u2207Y V |x. Let \u03b3(t) be a curve on M:\n\u03b3 : I \u2192 M which satis\ufb01es \u03b3(0) = x and\n\u03b3(cid:48)(0) = Yx. Then the covariant derivative along\n|t=0 can be computed by pro-\nthe direction d\u03b3(t)\ndt\ndt |t=0 to the tangent space TxM at x.\njecting dV\ndt |t=0), where\nIn other words, \u2207\u03b3(cid:48)(0)V |x = Px( dV\nPx : v \u2208 Rm \u2192 Px(v) \u2208 TxM is the projection\nmatrix. It is not dif\ufb01cult to check that the compu-\ntation of \u2207Y V |x is independent to the choice of\nthe curve \u03b3.\n\nvector \ufb01eld, then we require the vector \ufb01eld to be as parallel as possible. Therefore, we try to learn\nthe function f and its gradient \ufb01eld \u2207f simultaneously. Formally, we propose to learn a function f\nand a vector \ufb01eld V on the manifold with two constraints:\n\n\u2022 The vector \ufb01eld V should be close to the gradient \ufb01eld \u2207f of f, which can be formularized\n\nas follows:\n\nM\n\u2022 The vector \ufb01eld V should be as parallel as possible:\n\nmin\n\nf\u2208C\u221e,V\n\nR1(f, V ) =\n\n(cid:107)\u2207f \u2212 V (cid:107)2\n\nmin\n\nV\n\nR2(V ) =\n\nM\n\n(cid:107)\u2207V (cid:107)2\n\nF\n\n(cid:90)\n(cid:90)\n\nwhere \u2207 is the covariant derivative on the manifold, (cid:107) \u00b7 (cid:107)F denotes the Frobenius norm.\n\nIn the following, we provide some detailed explanation of R2(V ). \u2207V measures the change of the\nvector \ufb01eld V . If \u2207V vanishes, then V is a parallel \ufb01eld. For a given direction Yx at x \u2208 M, the\ngeometrical meaning of \u2207Y V |x is demonstrated in Fig. 1. For a \ufb01xed point x \u2208 M, \u2207V |x is a\nlinear map on the tangent space TxM. According to the de\ufb01nition of Frobenius norm, we have\n\nd(cid:88)\n\n(cid:107)\u2207V (cid:107)2\n\nF =\n\n(g(\u2207\u2202iV, \u2202j))2 =\n\ni,j=1\n\ni=1\n\nd(cid:88)\n(g(\u2207\u2202iV,\u2207\u2202iV ))\n\nwhere g is the Riemannian metric on M and \u22021, . . . , \u2202d is an orthonormal basis of TxM.\nNaturally, we propose the following objective function based on vector \ufb01eld regularization:\n\narg min\nf\u2208C\u221e(M),V\n\nE(f, V ) =\n\n1\nl\n\nR0(xi, yi, f ) + \u03bb1R1(f, V ) + \u03bb2R2(V )\n\nFor the loss function R0, we use the squared loss R0(f (xi), yi) = (f (xi) \u2212 yi)2 for simplicity.\n\nImplementation\n\n3\nSince the manifold M is unknown, the function f which minimizes (5) can not be directly solved.\nIn this section, we discuss how to discretize the continuous objective function (5).\n\n3.1 Vector Field Representation\n\ni=1 and n \u2212 l unlabeled points xl+1, . . . , xn in Rm. Let fi =\nGiven l labeled data points (xi, yi)l\nf (xi), i = 1, . . . , n, our goal is to learn a function f = (f1, . . . , fn)T . We \ufb01rst construct a nearest\nneighbor graph by either \u0001-neighborhood or k nearest neighbors. Let xi \u223c xj denote that xi and\n\n3\n\n(2)\n\n(3)\n\n(4)\n\n(5)\n\nl(cid:88)\n\ni=1\n\n\fxj are neighbors. For each point xi, we estimate its tangent space TxiM by performing PCA on its\nneighborhood. We choose the largest d eigenvectors as the bases since TxiM is d dimensional. Let\nTi \u2208 Rm\u00d7d be the matrix whose columns constitute an orthonormal basis for TxiM. It is easy to\nis the unique orthogonal projection from Rm onto the tangent space TxiM\nshow that Pi = TiT T\n[11]. That is, for any vector a \u2208 Rm, we have Pia \u2208 TxiM and (a \u2212 Pia) \u22a5 Pia.\ni\nLet V be a vector \ufb01eld on the manifold. For each point xi, let Vxi denote the value of the vector \ufb01eld\nV at xi, and \u2207V |xi denote the value of \u2207V at xi. According to the de\ufb01nition of vector \ufb01eld, Vxi\nshould be a vector in tangent space TxiM. Therefore, it can be represented by the local coordinates\nV is a dn-dimensional big column vector which concatenates all the vi\u2019s. In the following, we \ufb01rst\ndiscretize our objective function E(f, V ), and then minimize it to obtain f and V.\n\nof the tangent space, Vxi = Tivi, where vi \u2208 Rd. We de\ufb01ne V =(cid:0)vT\n\n(cid:1)T \u2208 Rdn. That is,\n\n1 , . . . , vT\nn\n\n3.2 Gradient Field Computation\n\nIn order to discretize R1(f, V ), we \ufb01rst discuss the Taylor expansion of f on the manifold. Let expx\ndenote the exponential map at x. The exponential map expx : TxM \u2192 M maps the tangent space\nTxM to the manifold M. Let a \u2208 TxM be a tangent vector. Then there is a unique geodesic \u03b3a\nsatisfying \u03b3a(0) = x with the initial tangent vector \u03b3(cid:48)\na(0) = a. The corresponding exponential map\nis de\ufb01ned as expx(ta) = \u03b3a(t), t \u2208 [0, 1]. Locally, the exponential map is a diffeomorphism.\nNote that f \u25e6expx : TxM \u2192 R is a smooth function on TxM. Then the following Taylor expansion\nof f holds:\n(6)\nwhere a \u2208 TxM is a suf\ufb01ciently small tangent vector. In the discrete case, let expxi denote the\nexponential map at xi. Since expxi is a diffeomorphism, there exists a tangent vector aij \u2208 TxiM\nsuch that expxi(aij) = xj. According to the de\ufb01nition of exponential map, (cid:107)aij(cid:107) equals to the\ngeodesic distance between xi and xj, which can be denoted as dij. Let eij be the unit vector in the\ndirection of aij, i.e., eij = aij/dij. We approximate aij by projecting the vector xj \u2212 xi to the\ntangent space, i.e., aij = Pi(xj \u2212 xi). Therefore, Eq. (6) can be rewritten as follows:\n\nf (expx(a)) \u2248 f (x) + (cid:104)\u2207f (x), a(cid:105),\n\nf (xj) = f (xi) + (cid:104)\u2207f (xi), Pi(xj \u2212 xi)(cid:105)\n\n(7)\nSince f is unknown, \u2207f is also unknown. In the following, we discuss how to compute (cid:107)\u2207f (xi) \u2212\nVxi(cid:107)2 discretely. We \ufb01rst show that the vector norm can be computed by an integral on a unit sphere,\nwhere the unit sphere can be discretely approximated by a neighborhood.\nLet u be a unit vector on tangent space TxM, then we have (see the exercise 1.12 in [12])\n\n(8)\nwhere Sd\u22121 is the unit (d \u2212 1)-sphere, d\u03c9d is its volume, and d\u03b4 is its volume form. Let \u2202i,\ni = 1, . . . , d, be an orthonormal basis of TxM. Then for any vector b \u2208 TxM, it can be written as\n\nSd\u22121\n\n(cid:104)X, u(cid:105)2d\u03b4(X) = 1\n\n(cid:90)\n\n1\n\u03c9d\n\n(cid:90)\n\n(cid:104)X, \u2202i(cid:105)2d\u03b4(X) =\n\n1\n\u03c9d\n\nSd\u22121\n\nSd\u22121\n\n(cid:104)X, b(cid:105)2d\u03b4(X)\n\nb =(cid:80)d\n\ni=1 bi\u2202i. Furthermore, we have\n(bi)2 1\n\u03c9d\n\n(cid:107)b(cid:107)2 =\n\n(bi)2 =\n\nd(cid:88)\n\ni=1\n\nd(cid:88)\n\ni=1\n\n(cid:90)\n\nFrom Eq. (7), we can see that\n\nThus, we have\n\n(cid:107)\u2207f (xi) \u2212 Vxi(cid:107)2 =\n\n(cid:104)\u2207f (xi), Pi(xj \u2212 xi)(cid:105) = f (xj) \u2212 f (xi).\n\n(cid:90)\nSd\u22121\n(cid:104)eij,\u2207f (xi) \u2212 Vxi(cid:105)2 =\n\n(cid:88)\n(cid:104)X,\u2207f (xi) \u2212 Vxi(cid:105)2d\u03b4(X)\n\nwij(cid:104)aij,\u2207f (xi) \u2212 Vxi(cid:105)2\n\nj\u223ci\n\nwij(cid:104)Pi(xj \u2212 xi),\u2207f (xi) \u2212 Vxi(cid:105)2\n\n(cid:0)(Pi(xj \u2212 xi))T Vxi \u2212 f (xj) + f (xi)(cid:1)2\n\nwij\n\n4\n\n(9)\n\n.\n\n1\n\u03c9d\n\n\u2248 (cid:88)\n(cid:88)\n(cid:88)\n\nj\u223ci\n\nj\u223ci\n\n=\n\n=\n\nj\u223ci\n\n\fwhere wij = d\u22122\nxj(cid:107)2/\u03b4) or simply by 0 \u2212 1 weight. Then R1 reduces to the following:\n\nij . The weight wij can be approximated either by heat kernel weight exp(\u2212(cid:107)xi \u2212\n\n(cid:88)\n\n(cid:88)\n\nj\u223ci\n\ni\n\n(cid:0)(xj \u2212 xi)T Tivi \u2212 fj + fi\n\n(cid:1)2\n\nwij\n\nR1(f, V) =\n\n3.3 Parallel Field Computation\n\nn(cid:88)\n\nAs discussed before, we hope the vector \ufb01eld to be as parallel as possible on the manifold. In the\ndiscrete case, R2 becomes\n\nF\n\ni=1\n\nR2(V) =\n\n(cid:107)\u2207V |xi(cid:107)2\nIn the following, we discuss how to approximate (cid:107)\u2207V |xi(cid:107)2\nF for a given point xi. Since we do\nnot know \u2207\u2202iV for a given basis \u2202i, (cid:107)\u2207V |xi(cid:107)2\nF cannot be computed according to Eq. (4). We\nde\ufb01ne a (0, 2) symmetric tensor \u03b1 as \u03b1(X, Y ) = g(\u2207X V,\u2207Y V ), where X and Y are vector \ufb01elds\nF , where \u22021, . . . , \u2202d is\nan orthonormal basis of the tangent space. For the trace of \u03b1, we have the following geometric\ninterpretation (see the exercise 1.12 in [12]):\n\non the manifold. We have Trace(\u03b1) = (cid:80)d\n\ni=1 g(\u2207\u2202iV,\u2207\u2202iV ) = (cid:107)\u2207V (cid:107)2\n(cid:90)\n\n(11)\n\n(10)\n\n(12)\n\nwhere Sd\u22121 is the unit (d\u2212 1)-sphere, d\u03c9d its volume, and d\u03b4 its volume form. So for a given point\nxi, we can approximate (cid:107)\u2207V |xi(cid:107) by the following\n\nTrace(\u03b1) =\n\n\u03b1(X, X)d\u03b4(X)\n\n1\n\u03c9d\n\nSd\u22121\n\n(cid:90)\n\n\u03b1(X, X)|xid\u03b4(X) \u2248(cid:88)\n\n(cid:88)\n\nj\u223ci\n\n\u03b1(eij, eij) =\n\nj\u223ci\n\n(cid:107)\u2207eij V (cid:107)2\n\n(cid:107)\u2207V |xi(cid:107)2\n\nF = Trace(\u03b1)xi =\n\n1\n\u03c9d\n\nSd\u22121\n\n(13)\nThen we discuss how to discretize \u2207eij V . Given eij \u2208 TxiM, there exists a unique geodesic \u03b3(t)\nwhich satis\ufb01es \u03b3(0) = xi and \u03b3(cid:48)(0) = eij. Then the covariant derivative of vector \ufb01eld V along eij\nis given by (please see Fig. 1)\n\u2207eij V = Pi\n\nV (\u03b3(t)) \u2212 V (\u03b3(0))\n\nwij(PiVxj \u2212 Vxi)\n\n(Vxj \u2212 Vxi)\n\n(cid:18) dV\n\n\u2248 Pi\n\n|t=0\n\n(cid:19)\n\n\u221a\n\n=\n\ndt\n\n= Pi lim\nt\u21920\n\nt\n\ndij\n\nCombining Eq. (13), R2 becomes:\nR2(V) =\n\n(cid:88)\n\n(cid:88)\n\nj\u223ci\n\ni\n\nwij (cid:107)PiTjvj \u2212 Tivi(cid:107)2\n\n(14)\n\n3.4 Objective Function in the Discrete Form\nLet I denote a n \u00d7 n diagonal matrix where Iii = 1 if xi is labeled and Iii = 0 otherwise. And let\ny \u2208 Rn be a column vector whose i-th element is yi if xi is labeled and 0 otherwise. Then\n\nR0(f ) =\n\n(f \u2212 y)T I(f \u2212 y)\n\n1\nl\n\n(15)\n\nCombining R1 in Eq. (10) and R2 in Eq. (14), the \ufb01nal objective function in the discrete form can\nbe written as follows:\nE(f, V) =\n\n(cid:0)(xj \u2212 xi)T Tivi \u2212 fj + fi\n\n(f \u2212 y)T I(f \u2212 y) + \u03bb1\n\n(cid:88)\n\n(cid:88)\n\n(cid:1)2\n\nwij\n\n1\nl\n\n+ \u03bb2\n\n(cid:88)\n\n(cid:88)\n\nj\u223ci\n\ni\n\nj\u223ci\nwij (cid:107)PiTjvj \u2212 Tivi(cid:107)2\n\ni\n\n5\n\n(16)\n\n\f3.5 Optimization\n\nIn this subsection, we discuss how to solve the optimization problem (16).\nLet L denote the Laplacian matrix of the graph with weights wij. Then we can rewrite R1 as follows:\n\n(cid:88)\n\n(cid:88)\n\nj\u223ci\n\ni\n\n(cid:0)(xj \u2212 xi)T Tivi\n\n(cid:1)2 \u2212 2\n\nwij\n\n(cid:88)\n\n(cid:88)\n\nj\u223ci\n\ni\n\nwij(xj \u2212 xi)T TivisT\nijf\n\nR1(f, V) = 2f T Lf +\n\nwhere sij \u2208 Rn is a selection vector of all zero elements except for the i-th element being \u22121 and\nthe j-th element being 1. Then the partial derivative of R1 with respect to the variable vi is\ni (xj \u2212 xi)sT\nijf\n\ni (xj \u2212 xi)(xj \u2212 xi)T Tivi \u2212 2\n\n\u2202R1(f, V)\n\n(cid:88)\n\n(cid:88)\n\nwijT T\n\nwijT T\n\n= 2\n\nj\u223ci\n\n\u2202vi\n\nj\u223ci\n\nThus we get\n\nwhere G is a dn \u00d7 dn block diagonal matrix, and C = [C T\nDenote the i-th d \u00d7 d diagonal block of G by Gii and the i-th d \u00d7 n block of C by Ci, we have\n\n1 , . . . , C T\n\n(17)\nn ]T is a dn \u00d7 n block matrix.\n\n\u2202R1(f, V)\n\n\u2202V\n\n= 2GV \u2212 2Cf\n\n(cid:88)\n(cid:88)\n\nj\u223ci\n\nj\u223ci\n\nGii =\n\nCi =\n\nwijT T\n\nwijT T\n\ni (xj \u2212 xi)(xj \u2212 xi)T Ti\ni (xj \u2212 xi)sT\n\nij\n\n(18)\n\n(19)\n\n(20)\n\n(cid:1)\n\n(22)\n\n(23)\n\n(24)\n\n(25)\n\n(26)\n\nThe partial derivative of R1 with respect to the variable f is\n\n\u2202R1(f, V)\n\n\u2202f\n\n= 4Lf \u2212 2C T V\n\nSimilarly, we can compute the partial derivative of R2 with respect to the variable vi:\n\u2202R2(V)\n\ni TjT T\n\nj Ti + I)vi \u2212 2T T\n\ni Tjvj\n\n= 2\n\nwij\n\nwij\n\n(cid:0)(QijQT\n\n(cid:0)(T T\n\nij + I)vi \u2212 2Qijvj\n\n(cid:1) = 2\n\n(cid:88)\n\nj\u223ci\n\n(cid:88)\n\nj\u223ci\n\n\u2202vi\n\nwhere Qij = T T\n\ni Tj. Thus we obtain\n\n(21)\nwhere B is a dn \u00d7 dn sparse block matrix. If we index each d \u00d7 d block by Bij, then for i, j =\n1, . . . , n, we have\n\n\u2202R2\n\n\u2202V = 2BV\n(cid:88)\n(cid:26)\u22122wijQij,\n\nwij(QijQT\n\nj\u223ci\n\nij + I)\nif xi \u223c xj\notherwise\n\n0,\n\nBii =\n\nBij =\n\nNotice that \u2202R0\n\nI(f \u2212 y). Combining Eq. (17), Eq. (20) and Eq. (21), we have\n\n\u2202f = 2 1\n\u2202E(f, V)\n\nl\n\n\u2202E(f, V)\n\n\u2202f\n\n\u2202V\n\n=\n\n=\n\n+ \u03bb1\n\n\u2202R0\n\u2202f\n\u2202R0\n\u2202V + \u03bb1\n\n+ \u03bb2\n\n\u2202R1\n\u2202f\n\u2202R1\n\u2202V + \u03bb2\n\n\u2202R2\n\u2202f\n\u2202R2\n\n= 2(\n\n1\nl\n\nI + 2\u03bb1L)f \u2212 2\u03bb1C T V \u2212 2\n\n1\nl\n\ny\n\n\u2202V = \u22122\u03bb1Cf + 2(\u03bb1G + \u03bb2B)V\n\nRequiring that the derivatives vanish, we \ufb01nally get the following linear system\n\n(cid:18) 1\n\nl\n\nI + 2\u03bb1L\n\u2212\u03bb1C\n\n\u2212\u03bb1C T\n\u03bb1G + \u03bb2B\n\n6\n\n(cid:19)(cid:18)f\n\n(cid:19)\n\nV\n\n=\n\n(cid:18) 1\n\n(cid:19)\n\nl y\n0\n\n\f(a) Ground truth\n\n(b) Laplacian (3.65)\n\n(c) Hessian (1.35)\n\n(d) PFR (1.14)\n\nFigure 2: Global temperature prediction. Regression on the satellite measurement of temperatures\nin the middle troposphere. 1% samples are randomly selected as training data. The ground truth\nis shown in (a). The colors indicate temperature values (in Kelvin). The regression results are\nvisualized in (b)\u223c(d). The numbers in the captions are the mean absolute prediction errors.\n\n4 Related Work and Discussion\n\nThe approximation of the Laplacian operator using the graph Laplacian [5] has enjoyed a great\nsuccess in the last decade. Some theoretical results [13, 14] also show the consistency of the ap-\nproximation. One of the most important features of the graph Laplacian is that it is coordinate free.\nThat is, it does not depend on any special coordinate system.\nThe estimation of Hessian is very dif\ufb01cult and there is few work on it. Previous approaches [2, 15]\n\ufb01rst estimate normal coordinates in the tangent space, and then estimate the \ufb01rst order derivative at\neach point, which is a matrix pseudo-inversion problem. One major limitation of this is that when the\nnumber of nearest neighbors k is larger than d + d(d+1)\n, where d is the dimension of the manifold,\nthe estimation will be inaccurate and unstable [15]. This is contradictory to the asymptotic case,\nsince it is not desirable that k is bounded by a \ufb01nite number when the data is suf\ufb01ciently dense. In\ncontrast, our method is coordinate free. Also, we directly estimate the norm of the second order\nderivative instead of trying to estimate its coef\ufb01cients, which turns out to be an integral problem\nover the neighboring points. We only need to do simple matrix multiplications to approximate the\nintegral at each point, but do not have to solve matrix inversion problems. Therefore, asymptotically,\nwe would expect our method to be much more accurate and robust for the approximation of the norm\nof the second order derivative.\n\n2\n\n5 Experiments\n\nIn this section, we compare our proposed Parallel Field Regularization (PFR) algorithm with two\nstate-of-the-art semi-supervised regression methods: Laplacian regularized transduction (Laplacian)\n[8] and Hessian regularized transduction (Hessian)1 [15], respectively. Our experiments are carried\nout on two real-world data sets. Regularization parameters for all algorithms are chosen via cross-\nvalidation.\n\n5.1 Global Temperature\n\nIn this test, we perform regression on the earth surface, which is a 2D sphere manifold. We try to\npredict the satellite measurement of temperatures in the middle troposphere in Dec. 20042, which\ncontains 9504 valid temperature measurements. The coordinates (latitude, longitude) of the mea-\nsurements are used as features and the corresponding temperature values are the responses. The\ndimension of manifold is set to 2 and the number of nearest neighbors is set to 6 in graph construc-\ntion. We randomly select 1% of the samples as labeled data, and compare the predicted temperature\nvalues with the ground truth on the rest of the data.\nThe regression results are shown in Fig. 2. The numbers in the captions indicate the mean absolute\nprediction errors generated by different algorithms. It can be seen from the visualization result that\n\n1We use the code from the authors downloadable from http://www.ml.uni-saarland.de/code/\n\nHessianSSR/HessianSSR.html.\n\n2http://www.remss.com/msu/.\n\n7\n\n\f6\n1\n0\n\ne\nm\na\nr\nf\n\n0\n0\n3\n\ne\nm\na\nr\nf\n\nLaplacian\n\nHessian\n\nPFR\n\nFigure 4: The examples of regression results on the moving hand data\nset. 60 labeled samples are used for training. Each row shows the re-\nsults obtained via the three algorithms for a frame. In each image, the\nred dots indicate the ground truth positions we labeled manually, and\nthe blue arrows show the positions predicted by different algorithms.\n\nFigure 3: Results on the\nmoving hand dataset.\n\nHessian and PFR perform better than Laplacian. Furthermore, from the prediction error, we can see\nthat PFR outperforms Hessian.\n\n5.2 Positions of Moving Hands\n\nIn this subsection, we perform experiments using a video of a subject sitting in a sofa and waving\nhis arms 1. Our goal is to predict the positions of the (left and right) elbows and wrists. We extract\nthe \ufb01rst 500 frames of the video and manually label the positions of the elbows and wrists. We scale\neach frame to size of 120 \u00d7 90 and use the raw pixels (10800-dimensional vectors) as the features.\nThe response for each frame is a 8-dimensional vector whose elements are the 2D coordinates of the\nelbows and wrists. Since there are 8 free parameters, we set the dimension of manifold to 8. We use\n18 nearest neighbors in graph construction.\nWe run the experiments with different numbers of labeled frames. For each given number of labeled\nframes, we perform 10 tests with randomly selected labeled set. The average of the mean absolute\nerror (MAE) for each test is calculated. The \ufb01nal result is shown in Fig. 3. As can be seen, PFR\nconsistently outperforms the other two algorithms. Laplacian yields high MAE. Hessian is very\nunstable on this dataset, and the results vary drastically with different numbers of labels.\nWe also show some example frames in Fig. 4. The red dots in the \ufb01gures indicate the ground truth\npositions and the blue arrows are drawn by connecting the positions of elbows and wrists predicted\nby different algorithms. Again we can verify that PFR performs better than the other two algorithms.\n\n6 Conclusion\n\nIn this paper, we propose a novel semi-supervised learning algorithm from the vector \ufb01eld perspec-\ntive. We show the relationship between vector \ufb01elds and functions on the manifold. The parallelism\nof the vector \ufb01eld is used to measure the linearity of the target prediction function. Parallel \ufb01elds are\none kind of special vector \ufb01elds on the manifold, which have very nice properties. It is interesting\nto explore other kinds of vector \ufb01elds to facilitate learning on manifolds. Moreover, vector \ufb01elds\ncan also be used to study the geometry and topology of the manifold. For example, Poincar\u00b4e-Hopf\ntheorem tells us that the sum of the indices over all the isolated zeroes of a vector \ufb01eld equals to the\nEuler characteristic of the manifold, which is a very important topological invariant.\n\nAcknowledgments\n\nThis work was supported by the National Natural Science Foundation of China under\nGrant 61125203, the National Basic Research Program of China (973 Program) under Grant\n2012CB316404, and the National Natural Science Foundation of China under Grants 90920303\nand 60875044.\n\n1The video is obtained from http://www.csail.mit.edu/\u02dcrahimi/manif.\n\n8\n\n204060801005101520number of labelsMAE PFRHessianLaplacian\fReferences\n[1] J. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear\n\ndimensionality reduction. Science, 290(5500):2319\u20132323, 2000.\n\n[2] D. L. Donoho and C. E. Grimes. Hessian eigenmaps: Locally linear embedding techniques for\nhigh-dimensional data. Proceedings of the National Academy of Sciences of the United States\nof America, 100(10):5591\u20135596, 2003.\n\n[3] S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Sci-\n\nence, 290(5500):2323\u20132326, 2000.\n\n[4] M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and\nclustering. In Advances in Neural Information Processing Systems 14, pages 585\u2013591. 2001.\n[5] Fan R. K. Chung. Spectral Graph Theory, volume 92 of Regional Conference Series in Math-\n\nematics. AMS, 1997.\n\n[6] X. Zhu and J. Lafferty. Semi-supervised learning using gaussian \ufb01elds and harmonic functions.\n\nIn Proc. of the 20th Internation Conference on Machine Learning, 2003.\n\n[7] D. Zhou, O. Bousquet, T.N. Lal, J. Weston, and B. Sch\u00a8olkopf. Learning with local and global\n\nconsistency. In Advances in Neural Information Processing Systems 16, 2003.\n\n[8] Mikhail Belkin, Irina Matveeva, and Partha Niyogi. Regularization and semi-supervised learn-\n\ning on large graphs. In Conference on Learning Theory, pages 624\u2013638, 2004.\n\n[9] John Lafferty and Larry Wasserman. Statistical analysis of semi-supervised regression.\n\nAdvances in Neural Information Processing Systems 20, pages 801\u2013808, 2007.\n\nIn\n\n[10] P. Petersen. Riemannian Geometry. Springer, New York, 1998.\n[11] G. H. Golub and C. F. Van Loan. Matrix computations. Johns Hopkins University Press, 3rd\n\nedition, 1996.\n\n[12] B. Chow, P. Lu, and L. Ni. Hamilton\u2019s Ricci Flow. AMS, Providence, Rhode Island, 2006.\n[13] Mikhail Belkin and Partha Niyogi. Towards a theoretical foundation for laplacian-based man-\n\nifold methods. In Conference on Learning Theory, pages 486\u2013500, 2005.\n\n[14] Matthias Hein, Jean yves Audibert, and Ulrike Von Luxburg. From graphs to manifolds - weak\nIn Conference on Learning Theory,\n\nand strong pointwise consistency of graph laplacians.\npages 470\u2013485, 2005.\n\n[15] K. I. Kim, F. Steinke, and M. Hein. Semi-supervised regression using hessian energy with an\napplication to semi-supervised dimensionality reduction. In Advances in Neural Information\nProcessing Systems 22, pages 979\u2013987. 2009.\n\n9\n\n\f", "award": [], "sourceid": 323, "authors": [{"given_name": "Binbin", "family_name": "Lin", "institution": null}, {"given_name": "Chiyuan", "family_name": "Zhang", "institution": null}, {"given_name": "Xiaofei", "family_name": "He", "institution": null}]}