{"title": "Adaptive Manifold Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 1473, "page_last": 1480, "abstract": null, "full_text": "                        Adaptive Manifold Learning\n\n\n\n          Jing Wang, Zhenyue Zhang                                             Hongyuan Zha\n               Department of Mathematics                              Department of Computer Science\n      Zhejiang University, Yuquan Campus,                               Pennsylvania State University\n          Hangzhou, 310027, P. R. China                                   University Park, PA 16802\n                wroaring@sohu.com                                           zha@cse.psu.edu\n                zyzhang@zju.edu.cn\n\n\n\n\n                                                       Abstract\n\n          Recently, there have been several advances in the machine learning and\n          pattern recognition communities for developing manifold learning algo-\n          rithms to construct nonlinear low-dimensional manifolds from sample\n          data points embedded in high-dimensional spaces. In this paper, we de-\n          velop algorithms that address two key issues in manifold learning: 1)\n          the adaptive selection of the neighborhood sizes; and 2) better fitting the\n          local geometric structure to account for the variations in the curvature\n          of the manifold and its interplay with the sampling density of the data\n          set. We also illustrate the effectiveness of our methods on some synthetic\n          data sets.\n\n\n\n1    Introduction\n\nRecently, there have been advances in the machine learning community for developing ef-\nfective and efficient algorithms for constructing nonlinear low-dimensional manifolds from\nsample data points embedded in high-dimensional spaces, emphasizing simple algorithmic\nimplementation and avoiding optimization problems prone to local minima. The proposed\nalgorithms include Isomap [6], locally linear embedding (LLE) [3] and its variations, man-\nifold charting [1], hessian LLE [2] and local tangent space alignment (LTSA) [7], and they\nhave been successfully applied in several computer vision and pattern recognition prob-\nlems. Several drawbacks and possible extensions of the algorithms have been pointed out\nin [4, 7] and the focus of this paper is to address two key issues in manifold learning: 1)\nhow to adaptively select the neighborhood sizes in the k-nearest neighbor computation to\nconstruct the local connectivity; and 2) how to account for the variations in the curvature\nof the manifold and its interplay with the sampling density of the data set. We will discuss\nthose two issues in the context of local tangent space alignment (LTSA) [7], a variation\nof locally linear embedding (LLE) [3] (see also [5],[1]). We believe the basic ideas we\nproposed can be similarly applied to other manifold learning algorithms.\n\nWe first outline the basic steps of LTSA and illustrate its failure modes using two simple\nexamples. Given a data set X = [x1, . . . , xN ] with xi  Rm, sampled (possibly with\nnoise) from a d-dimensional manifold (d < m), LTSA proceeds in the following steps.\n\n1) LOCAL NEIGHBORHOOD CONSTRUCTION. For each xi, i = 1, . . . , N , determine a set\nXi = [xi , . . . , x               ] of its neighbors (k\n          1             ik                                  i nearest neighbors, for example).\n                              i\n\n\f\n                                                             k = 4                                   k = 6                         k = 8\n      1                                       0.3                                  0.5                                0.5\n\n\n\n0.9\n\n                                              0.2                                  0.4                                0.4\n\n0.8\n\n\n                                              0.1                                  0.3                                0.3\n0.7\n\n\n\n0.6\n                                               0                                   0.2                                0.2\n\n\n0.5\n\n\n                                             -0.1                                  0.1                                0.1\n0.4\n\n\n\n0.3                                          -0.2                                    0                                 0\n\n\n0.2\n\n                                             -0.3                                 -0.1                               -0.1\n\n0.1\n\n\n\n      0                                      -0.4                                 -0.2                               -0.2\n      -1                    0           1      -2              0             2       -2               0        2       -2           0          2\n\n10                                       0.15                                      0.2                                0.2\n\n\n\n 9\n                                             0.1                                  0.15                              0.15\n\n\n 8\n\n                                         0.05                                      0.1                                0.1\n\n 7\n\n\n                                              0                                   0.05                              0.05\n 6\n\n\n\n 5                                      -0.05                                       0                                  0\n\n\n\n 4\n                                         -0.1                                -0.05                              -0.05\n\n\n 3\n\n                                        -0.15                                     -0.1                              -0.1\n\n 2\n\n\n                                         -0.2                                -0.15                              -0.15\n 1\n\n\n\n 0                                      -0.25                                     -0.2                              -0.2\n      -5               0          5           -20             0             20      -20               0       20       -20           0        20\n\n\n\n\n\nFigure 1: The data sets (first column) and computed coordinates i by LTSA vs. the cen-\ntered arc-length coordinates Top row: Example 1. Bottom row: Example 2.\n\n\n\n2) LOCAL LINEAR FITTING. Compute an orthonormal basis Qi for the d-dimensional\ntangent space of the manifold at xi, and the orthogonal projection of each xi to the tangent\n                                                                                                                              j\n\nspace: (i) = QT\n             j                   i (xij - xi) where xi is the mean of the neighbors.\n3) LOCAL COORDINATES ALIGNMENT.                                                      Align the N local projections i                         =\n[(i)\n      1 ,    , (i)], i = 1, . . . , N, to obtain the global coordinates \n                       k                                                                                      1, . . . , N . Such an align-\n                            i\nment is achieved by minimizing the global reconstruction error\n\n                                                                                           1\n                                                     E 2                                                        2\n                                                       i 2                 Ti(I - eeT)\n                                                                                           k           - Lii 2                             (1.1)\n                                               i                      i                         i\n\nover all possible Li  Rdd and row-orthonormal T = [1, . . . , N]  RdN, where\nTi = [i , . . . ,                     ] with the index set\n                  1              iki                                       {i1,...,iki} determined by the neighborhood of\neach xi, and e is a vector of all ones.\n\nTwo strategies are commonly used for selecting the local neighborhood size ki: one is k\nnearest neighborhood ( k-NN with a constant k for all the sample points) and the other is -\nneighborhood [3, 6]. The effectiveness of the manifold learning algorithms including LTSA\ndepends on the manner of how the nearby neighborhoods overlap with each other and the\nvariation of the curvature of the manifold and its interplay with the sampling density [4].\nWe illustrate those issues with two simple examples.\n\nExample 1. We sample data points from a half unit circle xi = [cos(ti), sin(ti)]T , i =\n1 . . . , N. It is easy to see that ti represent the arc-length of the circle. We choose ti  [0, ]\naccording to\n                                                     ti+1 - ti = 0.1(0.001 + |cos(ti)|)\nstarting at t1 = 0, and set N = 152 so that tN   and tN+1 > . Clearly, the half circle\nhas unit curvature everywhere. This is an example of highly-varying sampling density.\n\nExample 2. The date set is generated as xi = [ti, 10e-t2i ]T , i = 1 . . . , N, where ti \n[-6, 6] are uniformly distributed. The curvature of the 1-D curve at parameter value t is\ngiven by\n                                                                           20\n                                                        c                     |1 - 2t2|e-t2\n                                                             g (t) = (1 + 40t2e-2t2)3/2\n\n\f\nwhich changes from mint cg(t) = 0 to maxt cg(t) = 20 over t  [-6, 6]. We set N = 180.\nThis is an example of highly-varying curvature.\n\nFor the above two data sets, LTSA with constant k-NN strategy fails for any reasonable\nk we have tested. So does LTSA with constant -neighborhoods. In the first column of\nFigure 1, we plot these two data sets. The computed coordinates by LTSA with constant k-\nneighborhoods are plotted against the centered arc-length coordinates for a selected range\nof k (ideally, the plots should display points on a straight line of slops /4).\n\n2     Adaptive Neighborhood Selection\n\nIn this section, we propose a neighborhood contraction and expansion algorithm for adap-\ntively selecting ki at each sample point xi. We assume that the data are generated from\na parameterized manifold, xi = f (i), i = 1, . . . , N, where f :   Rd  Rm. If f\nis smooth enough, using first-order Taylor expansion at a fixed  , for a neighboring \n                                                                                                                                                                                 , we\nhave\n\n                                  f (\n                                         ) = f ( ) + Jf ( )  ( - ) + (, ),                                                                                             (2.2)\nwhere Jf ( )  Rmd is the Jacobi matrix of f at  and (, ) represents the error term\ndetermined by the Hessian of f ,                                           (, \n                                                                                  )  cf()  -  22, where cf()  0 represents\nthe curvature of the manifold at  . Setting  = i and \n                                                                                                                  = i gives\n                                                                                                                                j\n\n                                  xi = x                                                                                             ).                                          (2.3)\n                                        j                    i + Jf (i)  (ij - i) + (i, ij\nA point xi can be regarded as a neighbor of x\n           j                                                                                       i with respect to the tangent space spanned\nby the columns of Jf (i) if\n\n                i                                                                                   )\n                      j - i 2 is small and                                           (i, ij 2                            Jf (i)  (ij - i) 2.\nThe above conditions, however, are difficult to verify in practice since we do not know\nJf (i). To get around this problem, consider an orthogonal basis matrix Qi of the tangent\nspace spanned by the columns of Jf (i) which can be approximately computed by the SVD\nof Xi - xieT, where xi is the mean of the neighbors xi = f( ), j = 1, . . . , k\n                                                                                                                  j                  ij                                 i. Note that\n\n                                                              k\n                                                   1               i\n                                  \n                                  xi =                                   x = x\n                                                   k                       ij              i + Jf (i)  (i - i) + i,\n                                                        i j=1\n\nwhere i is the mean of (i, i ), . . . , (                                                             ). Eliminating x\n                                                              1                            i, ik                                          i in (2.3) by the represen-\n                                                                                                     1\n                                                                                                                                           (i)\ntation above yields xi = \n                                                   x                                                                                               = (                 )\n                                   j                    i + Jf (i)  (ij - i) + (i)\n                                                                                                                       j      with j                          i, ij         - i. Let\n(i) = QT                                                                   = \n                                                                                 x                             + (i).\n j        i (xi                                                                                                                Thus, x                  can be selected as a\n                   j - \n                                xi), we have xij                                      i + Qi(i)\n                                                                                                          j            j                          ij\n\nneighbor of xi if the orthogonal projection (i)\n                                                                                             j       is small and\n\n                      (i)\n                      j          2 =          xij - xi - Qi(i)\n                                                                                      j       2                Qi(i)\n                                                                                                                  j           2 =          (i)\n                                                                                                                                            j           2.                       (2.4)\n\nAssume all the xi satisfy the above inequality, then we should approximately have\n                           j\n\n\n                      (I - QiQTi)(Xi - x0eT) F   QTi(Xi - x0eT) F                                                                                                              (2.5)\nWe will use (2.5) as a criterion for adaptive neighbor selection, starting with a K-NN at\neach sample point xi with a large enough initial K and deleting points one by one until\n(2.5) holds. This process will terminate when the neighborhood size equals d + k0 for\nsome small k0 and (2.5) is not true. In that case, we may need to reselect a k-NN that\n                                  (I-Q                       )(X\nminimizes the ratio                           i QT\n                                                        i               i - \n                                                                           xieT ) F as the neighborhood set as is detailed below.\n                                             QT (X\n                                              i              i - \n                                                                   xieT ) F\n\nNEIGHBORHOOD CONTRACTION.\n\n\f\n     C0. Determine the initial K and K-NN neighborhood X (K) = [x , . . . , x                                                                                                                                 ]\n                                                                                                                                                        i                 i                                        for x\n                                                                                                                                                                               1                    iK                        i,\n         ordered in non-decreasing distances to xi,\n\n                                                           xi1 - xi  xi2 - xi  ...  xiK - xi .\n         Set k = K.\n\n     C1. Let \n                       x(k)\n                        i              be the column mean of X(k)\n                                                                                                  i          . Compute the orthogonal basis matrix Q(k)\n                                                                                                                                                                                                                         i     ,\n         the d largest singular vectors of X(k)                                                                                            eT                             = (Q(k))T (X(k)\n                                                                                                       i          - x(k)\n                                                                                                                                 i               . Set (k)\n                                                                                                                                                                i                         i                        i          -\n             \n         x(k)eT )\n                  i               .\n\n     C2. If X(k)                                    eT                      (k)\n                            i          - x(k)\n                                              i             - Q(k)\n                                                                       i          i          F <  (k)\n                                                                                                                            i         F , then set Xi = X (k)\n                                                                                                                                                                                     i         , i = (k)\n                                                                                                                                                                                                                         i     ,\n         and terminate.\n\n     C3. If k > d+k0, then delete the last column of X(k)\n                                                                                                                                      i          to obtain X(k-1)\n                                                                                                                                                                     i               , set k := k -1,\n         and go to step C1, otherwise, go to step C4.\n\n                                                                                       X(j)-\n                                                                                                       x(j)eT -Q(j)(j)\n     C4. Let k = arc min                                                                i               i                        i          i      F\n                                                      d+k                                                                                               , and set X\n                                                                 0jK                                                                                                              i = X (k)\n                                                                                                                 (j)                                                                                    i    , i =\n                                                                                                                  i         F\n\n         (k)\n                  i    .\n\nStep C4 means that if there is no k-NN (k  d + k0) satisfying (2.5), then the contracted\nneighborhood Xi should be one that minimizes Xi-xieT -Qii F .\n                                                                                                                                           i F\n\nOnce the contraction step is done we can still add back some of unselected xi to increase\n                                                                                                                                                                                               j\nthe overlap of nearby neighborhoods while still keep (2.5) intact. In fact, we can add xi if\n                                                                                                                                                                                                                         j\n xij - xi - Qij   j which is demonstrated in the following result (we refer to [8]\nfor the proof).\n\nTheorem 2.1 Let Xi = [xi , . . . , x ] satisfy (2.5). Furthermore, we assume\n                                                           1                ik\n\n                             xi                                                                             ,          j = k + 1, . . . , k + p,                                                                        (2.6)\n                                   j - x0 - Qi(i)\n                                                                  j           (i)\n                                                                                             j\n\n\nwhere (i) = QT\n        j                         i (xij - x0). Denote by ~xi the column mean of the expanded matrix\n~\nXi = [Xi, xi                      , . . . x                ]. Then for the left-singular vector matrix ~\n                                                                                                                                                                      Q\n                       k+1                         ik+p                                                                                                                    i corresponding to\nthe d largest singular values of ~\n                                                                   Xi - ~xieT,\n                                                                                                                                                              p                    k+p\n      (I - ~Q ~\n                       iQT\n                                 i )( ~\n                                        Xi - ~xieT) F   ~QTi( ~Xi - ~xieT) F +                                                                                                               (i)\n                                                                                                                                                             k + p                                  j         2 .\n                                                                                                                                                                           j=k+1\n\n\n\nThe above result shows that if the mean of the projections (i)\n                                                                                                                                                   j         of the expanding neighbors\nis small and/or the number of the expanding points are relatively small, then approximately,\n\n                                       (I - ~Q ~\n                                                     iQT\n                                                                i )( ~\n                                                                       Xi - ~xieT) F   ~QTi( ~Xi - ~xieT) F.\nNEIGHBORHOOD EXPANSION.\n\n     E0. Set ki to be the column number of Xi obtained by the neighborhood contracting\n         step. For j = ki + 1, . . . , K, compute (i) = QT\n                                                                                                                       j                    i (xij - xi).\n     E1. Denote by Ji the index subset of j's, ki < j  K, such that (I - QiQTi)(xij -\n             \n         xi) 2  (i)\n                                         j          2. Expand Xi by adding xi , j\n                                                                                                                            j          Ji.\nExample 3. We construct the data points as xi = [sin(ti), cos(ti), 0.02ti]T , i = 1, . . . , N,\nwith ti  [0, 4] uniformly distributed, which is plotted in the top-left panel in Figure 2.\n\n\f\n                                                                                (a)                                                (b)                                    (c)\n                                                                           k=7                                                k=8                                         k=9\n                                            0.8                                                            0.1                                              0.1\n\n\n     0.4                                    0.6                                                      0.05                                                  0.05\n\n                                            0.4\n     0.2\n                                                                                                            0                                                    0\n\n                                            0.2\n\n      0                                                                                         -0.05                                                 -0.05\n      1                                         0\n                                      1\n             0                  0          -0.2                                                      -0.1                                                  -0.1\n                  -1     -1                     -10                              0             10           -10                     0                10          -10       0        10\n\n\n                        (d)                                                     (e)                                                (f)                                    (g) \n                   k=30                                                    k=15                                                                                          k=35\n     0.1                                   0.15                                                      0.15                     k=30                         0.05\n\n\n\n 0.05                                       0.1                                                            0.1                                                   0\n\n\n\n      0                                    0.05                                                      0.05                                             -0.05\n\n\n\n-0.05                                           0                                                           0                                              -0.1\n\n\n\n -0.1                                 -0.05                                                     -0.05                                                 -0.15\n      -10                0           10         -10                              0             10           -10                     0                10          -10       0        10\n\n\n\n\nFigure 2: Plots of the data sets (top left), the computed coordinates i by LTSA vs. the\ncentered arc-length coordinates (a  c), the computed coordinates i by LTSA with neigh-\nborhood C contraction vs the centered arc-length coordinates (e  g), and the computed\ncoordinates i by LTSA with neighborhood contraction and expansion vs. the centered\narc-length coordinates (bottom left)\n\n\n\nLTSA with constant k-NN fails for any k: small k leads to lack of necessary overlap among\nthe neighborhoods while for large k, the computed tangent space can not represent the local\ngeometry well. In (a  c) of Figure 2, we plot the coordinates computed by LTSA vs. the\narc-length of the curve. Contracting the neighborhoods without expansion also results in\nbad results, because of small sizes of the resulting neighborhoods, see (e  g) of Figure 2.\nPanel (d) of Figure 2 gives an excellent result computed by LTSA with both neighborhood\ncontraction and expansion. We want mention that our adaptive strategies also work well\nfor noisy data sets, we refer the readers to [8] for some examples.\n\n\n3      Alignment incorporating variations of manifold curvature\n\nLet Xi = [xi , . . . , x                    ] consists of the neighbors determined by the contraction and ex-\n                          1          iki\npansion steps in the above section. In (1.1), we can show that the size of the error term\n Ei 2 depends on the size of the curvature of manifold at sample point xi [8]. To make the\nminimization in (1.1) more uniform, we need to factor out the effect of the variations of the\ncurvature. To this end, we pose the following minimization problem,\n\n                                                                      1                          1\n                                     min                                          (T                                                                  2\n                                                                                        i(I - eeT ) - Lii)D-1\n                                                                                                                                                i     2,                          (3.7)\n                                     T,{L                        k                               k\n                                                i }         i              i                          i\n\n\nwhere Di = diag(((i)\n                                      1 ), . . . , ((i))), and ((i)) is proportional to the curvature of the\n                                                                                 ki                          j\nmanifold at the parameter value i, the computation of which will be discussed below. For\nfixed T , the optimal Li is given by Li = Ti(Ik                                                                              eeT )+ = T                          . Substituting it into\n                                                                                                           i - 1\n                                                                                                                  k                                  i+\n                                                                                                                       i                   i                i\n(3.7), we have the reduced minimization problem\n\n                                                                 1                               1\n                                     min                                        T                                                                     2\n                                                                                     i(Ik                  eeT - +\n                                                                                                                              i            i)D-1\n                                                                                                                                                i     2\n                                           T                     k                           i - k\n                                                       i              i                               i\n\nImposing the normalization condition T T T = I, a solution to the minimization problem\nabove is given by the d eigenvectors corresponding to the second to (d + 1)st smallest\n\n\f\neigenvalues of the following matrix\n\n                                      B  (SW)diag(D21/k1,...,D2n/kn)(SW)T,\nwhere W = (Ik                              eeT )(I                                     \n                          i - 1\n                                 k                              k                           i). Second-order analysis of the error term in (1.1)\n                                      i                              i - +\n                                                                                 i\nshows that we can set                                                                                                     2\n                                                 i((i)) =  + c\n                                                           j                                   f (i) (i)\n                                                                                                                 j              with a small positive constant  to\nensure i((i)) > 0                                                                                                                                                                      )\n                     j           , and cf (i)  0 represents the mean of curvatures cf(i, i for all\n                                                                                                                                                                                    j\nneighbors of xi.\n\nLet Qi denote the orthonormal matrix of the largest d right singular vectors of Xi(I\n1                                                                                                                                                                                                   -\n          eeT ). We can approximately compute c\nk                                                                                                f (i) as follows.\n     i\n\n\n                                                                                            k\n                                                                                 1               i     arccos(\n                                           c                                                                               min(QT\n                                                                                                                                        i Qi ))\n                                                f (i)                                                                                                   .\n                                                                         ki - 1                                                  2\n                                                                                              =2\n\nwhere min() is the smallest singular value of a matrix. Then the diagonal weights (i)\ncan be computed as\n\n                                                                                                      k\n                                                                                       2                  i    arccos(\n                                                                                      j 2                                           min(QT\n                                                                                                                                                   i Qi ))\n                                     i((i)) =  +                                                                                                                  .\n                                                j                                ki - 1                                                2\n                                                                                                      =2\n\n\nWith the above preparation, we are now ready to present the adaptive LTSA algorithm.\nGiven a data set X = [x1, . . . , xN ], the approach consists of the following steps:\n\nStep 1. Determining the neighborhood Xi = [xi , . . . , x                                                                                         ] for each x\n                                                                                                                      1                ik                                 i, i = 1, . . . , N,\n                                                                                                                                             i\n                using the neighborhood contraction/expansion steps in Section 2.\n\nStep 2. Compute the truncated SVD, say QiiV T\n                                                                                                                      i    of Xi(I - 1 eeT) with d columns in\n                                                                                                                                                    ki\n                both Qi and Vi, the projections (i) = QTi (xi - xi) with the mean xi of the\n                neighbors, and denote i = [(i)\n                                                                                           1 , . . . , (i)].\n                                                                                                                 ki\nStep 3. Estimate the curvatures as follows. For each i = 1, . . . , N ,\n\n                                                                                               k\n                                                                                      1               i -1 arccos(\n                                                                c                                                                    min(QT\n                                                                                                                                                  i Qi ))\n                                                                     i =                                                                                            ,\n                                                                            ki - 1 =2                                                (i) 2\n\nStep 4. Construct alignment matrix. For i = 1, . . . , N , set\n\n                                           1                                1\n                W                                                                                                                                                        2                          2\n                     i = Ik                               e, V                        e, V\n                               i -[                              i][                         i]T ,             Di = I+ diag(ci (i)\n                                            k                                                                                                                 1         2, . . . , ci (i)\n                                                                                                                                                                                              ki    2),\n                                                     i                        ki\n\n                where  is a small constant number (usually we set  = 1.0-6). Set initial B = 0.\n                Update B iteratively by\n\n                                B(Ii, Ii) := B(Ii, Ii) + WiD-1D-1W T\n                                                                                                                 i              i       i /ki, i = 1, . . . , N.\n\nStep 5. Align global coordinates. Compute the d + 1 smallest eigen-vectors of B and\n                pick up the eigenvector [u2, . . . , ud+1] matrix corresponding to the 2nd to d + 1st\n                smallest eigenvalues, and set T = [u2, . . . , ud+1]T .\n\n\n4          Experimental Results\n\nIn this section, we present several numerical examples to illustrate the performance of the\nadaptive LTSA algorithm. The test data sets include curves in 2D/3D Euclidean spaces.\n\n\f\n                                               k=4                       k=6                          k=8                             k=16\n10                      0.15                                0.15                     0.05                           0.15\n\n\n 8                                                                                                                       0.1\n                             0.1                                 0.1                       0\n\n 6                                                                                                                  0.05\n\n                        0.05                                0.05                    -0.05\n\n 4                                                                                                                            0\n\n                              0                                   0                  -0.1\n 2                                                                                                                 -0.05\n\n\n 0                 -0.05                                   -0.05                    -0.15                           -0.1\n       -5    0     5          -20                   0      20     -20     0         20     -20        0            20         -20      0      20\n\n\n\n\n 1                           0.3                                 0.3                      0.3                            0.3\n\n                             0.2                                 0.2                      0.2                            0.2\n0.8\n                             0.1                                 0.1                      0.1                            0.1\n\n0.6                           0                                   0                        0                                  0\n\n0.4                     -0.1                                -0.1                     -0.1                           -0.1\n\n                        -0.2                                -0.2                     -0.2                           -0.2\n0.2\n                        -0.3                                -0.3                     -0.3                           -0.3\n\n 0                      -0.4                                -0.4                     -0.4                           -0.4\n -1          0          1     -2                    0       2     -2      0          2     -2         0            2          -2       0      2\n\n\n\n\nFigure 3: The computed coordinates i by LTSA taking into account curvature and variable\nsize of neighborhood.\n\n\n\nFirst we apply the adaptive LTSA to the date sets shown in Examples 1 and 2. Adaptive\nLTSA with different starting k's works every well. See Figure 3. It shows that for these tow\ndata sets, the adaptive LTSA is not sensitive to the choice of the starting k or the variations\nin sampling densities and manifold curvatures.\n\nNext, we consider the swiss-roll surface defined by f (s, t) = [s cos(s), t, s sin(s)]T . It\nis easy to see that Jf (s, t)T Jf (s, t) = diag(1 + s2, 1). Denoting s = s(r) the inverse\ntransformation of r = r(s) defined by\n\n                                               s\n                                                                         1\n                             r(s) =                      1 + 2 d =          (s    1 + s2 + arcsinh(s)),\n                                                                         2\n                                          0\n\nthe swiss-roll surface can be parameterized as\n\n                                     ^\n                                     f (r, t) = [s(t) cos(s(r)), t, s(r) sin(s(r))]T\n\nand ^\n        f is isometric with respect to (r, t). In the left figure of Figure 4, we show there is a\ndistortion between the computed coordinates by LTSA with the best-fit neighborhood size\n(bottom left) and the generating coordinates (r, t)T (top right). In the right panel of the\nbottom row of the left figure of Figure 4, we plot the computed coordinates by the adaptive\nLTSA with initial neighborhood size k = 30. (In fact, the adaptive LTSA is insensitive\nto k and we will get similar results with a larger or smaller initial k). We can see that the\ncomputed coordinates by the adaptive LTSA can recover the generating coordinates well\nwithout much distortion.\n\nFinally we applied both LTSA and the adaptive LTSA to a 2D manifold with 3 peaks\nembedded in a 100 dimensional space. The data points are generated as follows. First\nwe generate N = 2000 3D points, yi = (ti, si, h(ti, si))T , where ti and si randomly\ndistributed in the interval [-1.5, 1.5] and h(t, s) is defined by\n                  h(t, s) = e-20t2-20s2 - e-10t2-10(s+1)2 - e-10(1+t)2-10s2.\nThen we embed the 3D points into a 100D space by xQ = Qy\n                                                                                                 i           i,     xH\n                                                                                                                         i         = Hyi, where\nQ  R1003 is a random orthonormal matrix resulting in an orthogonal transformation\nand H  R1003 a matrix with its singular values uniformly distributed in (0, 1) resulting\nin an affine transformation. In the top row of the right figure of Figure 4, we plot the\n\n\f\n                           swiss role                                   Generating Coordinate\n   10                                                                                                                                 (a)                                                    (b)\n                                                             1                                                      0.1                                                0.04\n\n\n                                                                                                                                                                       0.02\n    5                                                       0.5                                                    0.05\n                                                                                                                                                                         0\n\n    0                                                        0                                                       0                                                -0.02\n\n                                                                                                                                                                      -0.04\n                                                           -0.5                                                   -0.05\n   -5                                                                                                                                                                 -0.06\n    1\n         0                                                  -1\n         -1                                                                                                        -0.1                                               -0.08\n               -10        -5         0       5      10             0      10         20    30       40    50         -0.1    -0.05                0           0.05      -0.05         0             0.05         0.1\n\n\n\n                                                                                                                                      (c)                                                    (d)\n  0.06                                                     0.03                                                    0.05                                                0.06\n\n                                                           0.02\n  0.04                                                                                                                                                                 0.04\n                                                           0.01\n                                                                                                                                                                       0.02\n  0.02                                                       0\n                                                                                                                     0                                                   0\n    0                                                     -0.01\n                                                                                                                                                                      -0.02\n                                                          -0.02\n -0.02\n                                                          -0.03                                                                                                       -0.04\n\n -0.04                                                    -0.04                                                   -0.05                                               -0.06\n   -0.04         -0.02          0         0.02    0.04      -0.02               0           0.02          0.04      -0.05             0                       0.05      -0.06 -0.04 -0.02    0      0.02 0.04\n\n\n\nFigure 4: Left figure: 3D swiss-roll and the generating coordinates (top row), computed 2D\ncoordinates by LTSA with the best neighborhood size k = 15 (bottom left) and computed\n2D coordinates by adaptive LTSA (bottom right). Right figure: coordinates computed by\nLTSA for the orthogonally embedded 100D data set {xQi} (a) and the affinely embedded\n100D data set {xHi} (b), and the coordinates computed by the adaptive LTSA for {xQi} (c)\nand {xHi} (d).\n\ncomputed coordinates by LTSA for xQ\n                                                                                                 i (shown in (a)) and xH\n                                                                                                                                             i              (shown in (b)) with best-fit\nneighborhood size k = 15. We can see the deformations (stretching and compression) are\nquite prominent. In the bottom row of the right figure of Figure 4, we plot the computed\ncoordinates by the adaptive LTSA for xQ\n                                                                                                    i (shown in (c)) and xH\n                                                                                                                                                       i     (shown in (d)) with initial\nneighborhood size k = 15. It is clear that the adaptive LTSA gives a much better result.\n\n\nReferences\n\n [1] M. Brand. Charting a manifold. Advances in Neural Information Processing Systems,\n               15, MIT Press, 2003.\n\n [2] D. Donoho and C. Grimes. Hessian Eigenmaps: new tools for nonlinear dimension-\n               ality reduction. Proceedings of National Academy of Science, 5591-5596, 2003.\n\n [3] S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embed-\n               ding. Science, 290: 23232326, 2000.\n\n [4] L. Saul and S. Roweis. Think globally, fit locally: unsupervised learning of nonlinear\n               manifolds. Journal of Machine Learning Research, 4:119-155, 2003.\n\n [5] E. Teh and S. Roweis. Automatic Alignment of Local Representations. Advances in\n               Neural Information Processing Systems, 15, MIT Press, 2003.\n\n [6] J. Tenenbaum, V. De Silva and J. Langford. A global geometric framework for non-\n               linear dimension reduction. Science, 290:23192323, 2000.\n\n [7] Z. Zhang and H. Zha. Principal Manifolds and Nonlinear Dimensionality Reduction\n               via Tangent Space Alignment. SIAM J. Scientific Computing, 26:313338, 2004.\n\n [8] J. Wang, Z. Zhang and H. Zha. Adaptive Manifold Learning. Technical Report CSE-\n               04-21, Dept. CSE, Pennsylvania State University, 2004.\n\n\f\n", "award": [], "sourceid": 2560, "authors": [{"given_name": "Jing", "family_name": "Wang", "institution": null}, {"given_name": "Zhenyue", "family_name": "Zhang", "institution": null}, {"given_name": "Hongyuan", "family_name": "Zha", "institution": null}]}