{"title": "Iterative Non-linear Dimensionality Reduction with Manifold Sculpting", "book": "Advances in Neural Information Processing Systems", "page_first": 513, "page_last": 520, "abstract": null, "full_text": "Iterative Non-linear Dimensionality Reduction by\n\nManifold Sculpting\n\nMike Gashler, Dan Ventura, and Tony Martinez \u2217\n\nBrigham Young University\n\nProvo, UT 84604\n\nAbstract\n\nMany algorithms have been recently developed for reducing dimensionality by\nprojecting data onto an intrinsic non-linear manifold. Unfortunately, existing algo-\nrithms often lose signi\ufb01cant precision in this transformation. Manifold Sculpting\nis a new algorithm that iteratively reduces dimensionality by simulating surface\ntension in local neighborhoods. We present several experiments that show Man-\nifold Sculpting yields more accurate results than existing algorithms with both\ngenerated and natural data-sets. Manifold Sculpting is also able to bene\ufb01t from\nboth prior dimensionality reduction efforts.\n\n1 Introduction\n\nDimensionality reduction is a two-step process: 1) Transform the data so that more information\nwill survive the projection, and 2) project the data into fewer dimensions. The more relationships\nbetween data points that the transformation step is required to preserve, the less \ufb02exibility it will have\nto position the points in a manner that will cause information to survive the projection step. Due\nto this inverse relationship, dimensionality reduction algorithms must seek a balance that preserves\ninformation in the transformation without losing it in the projection. The key to \ufb01nding the right\nbalance is to identify where the majority of the information lies.\nNonlinear dimensionality reduction (NLDR) algorithms seek this balance by assuming that the re-\nlationships between neighboring points contain more informational content than the relationships\nbetween distant points. Although non-linear transformations have more potential than do linear\ntransformations to lose information in the structure of the data, they also have more potential to\nposition the data to cause more information to survive the projection. In this process, NLDR algo-\nrithms expose patterns and structures of lower dimensionality (manifolds) that exist in the original\ndata. NLDR algorithms, or manifold learning algorithms, have potential to make the high-level\nconcepts embedded in multidimensional data accessible to both humans and machines.\nThis paper introduces a new algorithm for manifold learning called Manifold Sculpting, which dis-\ncovers manifolds through a process of progressive re\ufb01nement. Experiments show that it yields\nmore accurate results than other algorithms in many cases. Additionally, it can be used as a post-\nprocessing step to enhance the transformation of other manifold learning algorithms.\n\n2 Related Work\n\nMany algorithms have been developed for performing non-linear dimensionality reduction. Recent\nworks include Isomap [1], which solves for an isometric embedding of data into fewer dimensions\nwith an algebraic technique. Unfortunately, it is somewhat computationally expensive as it requires\nsolving for the eigenvectors of a large dense matrix, and has dif\ufb01culty with poorly sampled areas of\n\n\u2217mikegashler@gmail.com, ventura@cs.byu.edu, martinez@cs.byu.edu\n\n1\n\n\fFigure 1: Comparison of several manifold learners on a Swiss Roll manifold. Color is used to\nindicate how points in the results correspond to points on the manifold. Isomap and L-Isomap have\ntrouble with sampling holes. LLE has trouble with changes in sample density.\n\nthe manifold. (See Figure 1.A.) Locally Linear Embedding (LLE) [2] is able to perform a similar\ncomputation using a sparse matrix by using a metric that measures only relationships between vec-\ntors in local neighborhoods. Unfortunately it produces distorted results when the sample density is\nnon-uniform. (See Figure 1.B.) An improvement to the Isomap algorithm was later proposed that\nuses landmarks to reduce the amount of necessary computation [3]. (See Figure 1.C.) Many other\nNLDR algorithms have been proposed, including Kernel Principle Component Analysis [4], Lapla-\ncian Eigenmaps [5], Manifold Charting [6], Manifold Parzen Windows [7], Hessian LLE [8], and\nothers [9, 10, 11]. Hessian LLE preserves the manifold structure better than the other algorithms but\nis, unfortunately, computationally expensive. (See Figure 1.D.).\nIn contrast with these algorithms, Manifold Sculpting is robust to sampling issues and still produces\nvery accurate results. This algorithm iteratively transforms data by balancing two opposing heuris-\ntics, one that scales information out of unwanted dimensions, and one that preserves local structure\nin the data. Experimental results show that this technique preserves information into fewer dimen-\nsions with more accuracy than existing manifold learning algorithms. (See Figure 1.E.)\n\n3 The Algorithm\n\nAn overview of the Manifold Sculpting algorithm is given in Figure 2a.\n\nFigure 2: \u03b4 and \u03b8 de\ufb01ne the relationships that Manifold Sculpting attempts to preserve.\n\n2\n\n\fStep 1: Find the k nearest neighbors of each point. For each data point pi in P (where P is the set\nof all data points represented as vectors in Rn), \ufb01nd the k-nearest neighbors Ni (such that nij \u2208 Ni\nis the jth neighbor of point pi).\nStep 2: Compute relationships between neighbors. For each j (where 0 < j \u2264 k) compute the\nEuclidean distance \u03b4ij between pi and each nij \u2208 Ni. Also compute the angle \u03b8ij formed by the\ntwo line segments (pi to nij) and (nij to mij), where mij is the most colinear neighbor of nij with\npi. (See Figure 2b.) The most colinear neighbor is the neighbor point that forms the angle closest\nto \u03c0. The values of \u03b4 and \u03b8 are the relationships that the algorithm will attempt to preserve during\ntransformation. The global average distance between all the neighbors of all points \u03b4ave is also\ncomputed.\nStep 3: Optionally preprocess the data. The data may optionally be preprocessed with the trans-\nformation step of Principle Component Analysis (PCA), or another ef\ufb01cient algorithm. Manifold\nSculpting will work without this step; however, preprocessing can result in signi\ufb01cantly faster con-\nvergence. To the extent that there is a linear component in the manifold, PCA will move the infor-\nmation in the data into as few dimensions as possible, thus leaving less work to be done in step 4\n(which handles the non-linear component). This step is performed by computing the \ufb01rst |Dpres|\nprinciple components of the data (where Dpres is the set of dimensions that will be preserved in\nthe projection), and rotating the dimensional axes to align with these principle components. (An\nef\ufb01cient algorithm for computing principle components is presented in [12].)\nStep 4: Transform the data. The data is iteratively transformed until some stopping criterion has\nbeen met. One effective technique is to stop when the sum change of all points during the current\niteration falls below a threshold. The best stopping criteria depend on the desired quality of results \u2013\nif precision is important, the algorithm may iterate longer; if speed is important it may stop earlier.\nStep 4a: Scale values. All the values in Dscal (The set of dimensions that will be eliminated by the\nprojection) are scaled by a constant factor \u03c3, where 0 < \u03c3 < 1 (\u03c3 = 0.99 was used in this paper).\nOver time, the values in Dscal will converge to 0. When Dscal is dropped by the projection (step 5),\nthere will be very little informational content left in these dimensions.\nStep 4b: Restore original relationships. For each pi \u2208 P , the values in Dpres are adjusted to\nrecover the relationships that are distorted by scaling. Intuitively, this step simulates tension on the\nmanifold surface. A heuristic error value is used to evaluate the current relationships among data\npoints relative to the original relationships:\n\nkX\n\nj=0\n\n\u0001pi =\n\nwij\n\n (cid:18) \u03b4ij \u2212 \u03b4ij0\n\n(cid:19)2\n\n2\u03b4ave\n\n(cid:18) \u03b8ij \u2212 \u03b8ij0\n\n(cid:19)2!\n\n+\n\n\u03c0\n\n(1)\n\nwhere \u03b4ij is the current distance to nij, \u03b4ij0 is the original distance to nij measured in step 2, \u03b8ij\nis the current angle, and \u03b8ij0 is the original angle measured in step 2. The denominator values\nwere chosen as normalizing factors because the value of the angle term can range from 0 to \u03c0, and\nthe value of the distance term will tend to have a mean of about \u03b4ave with some variance in both\ndirections. We adjust the values in Dpres for each point to minimize this heuristic error value.\nThe order in which points are adjusted has some impact on the rate of convergence. Best results were\nobtained by employing a breadth-\ufb01rst neighborhood graph traversal from a randomly selected point.\n(A new starting point is randomly selected for each iteration.) Intuitively this may be analogous to\nthe manner in which a person smoothes a crumpled piece of paper by starting at an arbitrary point\nand smoothing outward. To further speed convergence, higher weight, wij, is given to the component\nof the error contributed by neighbors that have already been adjusted in the current iteration. For all\nof our experiments, we use wij = 1 if ni has not yet been adjusted in this iteration, and wij = 10,\nif nij has been adjusted in this iteration.\nUnfortunately the equation for the true gradient of the error surface de\ufb01ned by this heuristic is\ncomplex, and is in O(|D|3). We therefore use the simple hill-climbing technique of adjusting in\neach dimension in the direction that yields improvement.\nSince the error surface is not necessarily convex, the algorithm may potentially converge to local\nminima. At least three factors, however, mitigate this risk: First, the PCA pre-processing step often\ntends to move the whole system to a state somewhat close to the global minimum. Even if a local\n\n3\n\n\fFigure 3: The mean squared error of four algorithms with a Swiss Roll manifold using a varying\nnumber of neighbors k. When k > 57, neighbor paths cut across the manifold. Isomap is more\nrobust to this problem than other algorithms, but HLLE and Manifold Sculpting still yield better\nresults. Results are shown on a logarithmic scale.\n\nminimum exists so close to the globally optimal state, it may have a suf\ufb01ciently small error as to be\nacceptable. Second, every point has a unique error surface. Even if one point becomes temporarily\nstuck in a local minimum, its neighbors are likely to pull it out, or change the topology of its error\nsurface when their values are adjusted. Very particular conditions are necessary for every point to\nsimultaneously \ufb01nd a local minimum. Third, by gradually scaling the values in Dscaled (instead of\ndirectly setting them to 0), the system always remains in a state very close to the current globally\noptimal state. As long as it stays close to the current optimal state, it is unlikely for the error\nsurface to change in a manner that permanently separates it from being able to reach the globally\noptimal state. (This is why all the dimensions need to be preserved in the PCA pre-processing step.)\nAnd perhaps most signi\ufb01cantly, our experiments show that Manifold Sculpting generally tends to\nconverge to very good results.\nStep 5: Project the data. At this point Dscal contains only values that are very close to zero. The\ndata is projected by simply dropping these dimensions from the representation.\n\n4 Empirical Results\n\nFigure 1 shows that Manifold Sculpting appears visually to produce results of higher quality than\nLLE and Isomap with the Swiss Roll manifold, a common visual test for manifold learning algo-\nrithms. Quantitative analysis shows that it also yields better results than HLLE. Since the actual\nstructure of this manifold is known prior to using any manifold learner, we can use this prior infor-\nmation to quantitatively measure the accuracy of each algorithm.\n\n4.1 Varying number of neighbors.\nWe de\ufb01ne a Swiss Roll in 3D space with n points (xi, yi, zi) for each 0 \u2264 i < n, such that xi =\nt sin(t), yi is a random number \u22126 \u2264 yi < 6, and zi = t cos(t), where t = 8i/n + 2. In 2D\n\u221a\nmanifold coordinates, the point is (ui, vi), such that ui = sinh\u22121(t)+t\nWe created a Swiss Roll with 2000 data points and reduced the dimensionality to 2 with each of four\nalgorithms. Next we tested how well these results align with the expected values by measuring the\nmean squared distance from each point to its expected value. (See Figure 3.) We rotated, scaled,\nand translated the values as required to obtain the minimum possible error measurement for each\nalgorithm. These results are consistent with a qualitative assessment of Figure 1. Results are shown\nwith a varying number of neighbors k. In this example, when k = 57, local neighborhoods begin\nto cut across the manifold. Isomap is more robust to this problem than other algorithms, but HLLE\nand Manifold Sculpting still yield better results.\n\nand vi = yi.\n\nt2+1\n\n2\n\n4\n\n\fFigure 4: The mean squared error of points from an S-Curve manifold for four algorithms with a\nvarying number of data points. Manifold Sculpting shows a trend of increasing accuracy with an\nincreasing number of points. This experiment was performed with 20 neighbors. Results are shown\non a logarithmic scale.\n\n4.2 Varying sample densities.\n\nA similar experiment was performed with an S-Curve manifold. We de\ufb01ned the S-Curve points in\n3D space with n points (xi, yi, zi) for each 0 \u2264 i < n, such that xi = t, yi = sin(t), and zi is\na random number 0 \u2264 zi < 2, where t = (2.2i\u22120.1)\u03c0\n. In 2D manifold coordinates, the point is\n(ui, vi), such that ui =\n\n(cid:16)pcos2(w) + 1\n\nZ t\n\n(cid:17)\n\nn\n\ndw and vi = yi.\n\n0\n\nFigure 4 shows the mean squared error of the transformed points from their expected values using\nthe same regression technique described for the experiment with the Swiss Roll problem. We varied\nthe sampling density to show how this affects each algorithm. A trend can be observed in this data\nthat as the number of sample points increases, the quality of results from Manifold Sculpting also\nincreases. This trend does not appear in the results from other algorithms.\nOne drawback to the Manifold Sculpting algorithm is that convergence may take longer when the\nvalue for k is too small. This experiment was also performed with 6 neighbors, but Manifold Sculpt-\ning did not always converge within a reasonable time when so few neighbors were used. The other\nthree algorithms do not have this limitation, but the quality of their results still tend to be poor when\nvery few neighbors are used.\n\n4.3 Entwined spirals manifold.\n\nA test was also performed with an Entwined Spirals manifold. In this case, Isomap was able to\nproduce better results than Manifold Sculpting (see Figure 5), even though Isomap yielded the worst\naccuracy in previous problems. This can be attributed to the nature of the Isomap algorithm. In cases\nwhere the manifold has an intrinsic dimensionality of exactly 1, a path from neighbor to neighbor\nprovides an accurate estimate of isolinear distance. Thus an algorithm that seeks to globally opti-\nmize isolinear distances will be less susceptible to the noise from cutting across local corners. When\nthe intrinsic dimensionality is higher than 1, however, paths that follow from neighbor to neighbor\nproduce a zig-zag pattern that introduces excessive noise into the isolinear distance measurement. In\nthese cases, preserving local neighborhood relationships with precision yields better overall results\nthan globally optimizing an error-prone metric. Consistent with this intuition, Isomap is the closest\ncompetitor to Manifold Sculpting in other experiments that involved a manifold with a single intrin-\nsic dimension, and yields the poorest results of the four algorithms when the intrinsic dimensionality\nis larger than one.\n\n5\n\n\fFigure 5: Mean squared error for four algorithms with an Entwined Spirals manifold.\n\n4.4\n\nImage-based manifolds.\n\nThe accuracy of Manifold Sculpting is not limited to generated manifolds in three dimensional\nspace. Unfortunately, the manifold structure represented by most real-world problems is not known\na priori. The accuracy of a manifold learner, however, can still be estimated when the problem\ninvolves a video sequence by simply counting the percentage of frames that are sorted into the same\norder as the video sequence. Figure 6 shows several frames from a video sequence of a person\nturning his head while gradually smiling. Each image was encoded as a vector of 1, 634 pixel\nintensity values. This data was then reduced to a single dimension. (Results are shown on three\nseparate lines in order to \ufb01t the page.) The one preserved dimension could then characterize each\nframe according to the high-level concepts that were previously encoded in many dimensions. The\ndot below each image corresponds to the single-dimensional value in the preserved dimension for\nthat image. In this case, the ordering of every frame was consistent with the video sequence.\n\n4.5 Controlled manifold topologies.\n\nFigure 7 shows a comparison of results obtained from a manifold generated by translating an image\nover a background of random noise. Nine of the 400 input images are shown as a sample, and\nresults with each algorithm are shown as a mesh. Each vertex is placed at a position corresponding\nto the two values obtained from one of the 400 images. For increased visibility of the inherent\nstructure, the vertexes are connected with their nearest input space neighbors. Because two variables\n(horizontal position and vertical position) were used to generate the dataset, this data creates a\nmanifold with an intrinsic dimensionality of two in a space with an extrinsic dimensionality of\n2,401 (the total number of pixels in each image). Because the background is random, the average\ndistance between neighboring points in the input space is uniform, so the ideal result is known to\nbe a square. The distortions produced by Manifold Sculpting tend to be local in nature, while the\ndistortions produced by other algorithms tend to be more global. Note that the points are spread\nnearly uniformly across the manifold in the results from Manifold Sculpting. This explains why the\nresults from Manifold Sculpting tend to \ufb01t the ideal results with much lower total error (as shown in\n\nFigure 6: Images of a face reduced by Manifold Sculpting into a single dimension. The values are\nare shown here on three wrapped lines in order to \ufb01t the page. The original image is shown above\neach point.\n\n6\n\n\fFigure 7: A comparison of results with a manifold generated by translating an image over a back-\nground of noise. Manifold Sculpting tends to produce less global distortion, while other algorithms\ntend to produce less local distortion. Each point represents an image. This experiment was done\nin each case with 8 neighbors. (LLE fails to yield results with these parameters, but [13] reports a\nsimilar experiment in which LLE produces results. In that case, as with Isomap and HLLE as shown\nhere, distortion is clearly visible near the edges.)\n\nFigure 3 and Figure 4). Perhaps more signi\ufb01cantly, it also tends to keep the intrinsic variables in the\ndataset more linearly separable. This is particularly important when the dimensionality reduction is\nused as a pre-processing step for a supervised learning algorithm.\nWe created four video sequences designed to show various types of manifold topologies and mea-\nsured the accuracy of each manifold learning algorithm. These results (and sample frames from each\nvideo) are shown in Figure 8. The \ufb01rst video shows a rotating stuffed animal. Since the background\npixels remain nearly constant while the pixels on the rotating object change in value, the manifold\ncorresponding to the vector encoding of this video will contain both smooth and changing areas.\nThe second video was made by moving a camera down a hallway. This produces a manifold with a\ncontinuous range of variability, since pixels near the center of the frame change slowly while pixels\nnear the edges change rapidly. The third video pans across a scene. Unlike the video of the rotating\nstuffed animal, there are no background pixels that remain constant. The last video shows another\nrotating stuffed animal. Unlike the \ufb01rst video, however, the high-contrast texture of the object used\nin this video results in a topology with much more variation. As the black spots shift across the\npixels, a manifold is created that swings wildly in the respective dimensions. Due to the large hills\nand valleys in the topology of this manifold, the nearest neighbors of a frame frequently create paths\nthat cut across the manifold. In all four cases, Manifold Sculpting produced results competitive\nwith Isomap, which does particularly well with manifolds that have an intrinsic dimensionality of\n\nFigure 8: Four video sequences were created with varying properties in the corresponding manfolds.\nDimensionality was reduced to one with each of four manifold learning algorithms. The percentage\nof frames that were correctly ordered by each algorithm is shown.\n\n7\n\n\fone, but Manifold Sculpting is not limited by the intrinsic dimensionality as shown in the previous\nexperiments.\n\n5 Discussion\n\nThe experiments tested in this paper show that Manifold Sculpting yields more accurate results\nthan other well-known manifold learning algorithms. Manifold Sculpting is robust to holes in the\nsampled area. Manifold Sculpting is more accurate than other algorithms when the manifold is\nsparsely sampled, and the gap is even wider with higher sampling densities. Manifold Sculpting\nhas dif\ufb01culty when the selected number of neighbors is too small but consistently outperforms other\nalgorithms when it is larger.\nDue to the iterative nature of Manifold Sculpting, it\u2019s dif\ufb01cult to produce a valid complexity analysis.\nConsequently, we measured the scalability of Manifold Sculpting empirically and compared it with\nthat of HLLE, L-Isomap, and LLE. Due to space constraints these results are not included here, but\nthey indicate that Manifold Sculpting scales better than the other algorithms when when the number\nof data points is much larger than the number of input dimensions.\nManifold Sculpting bene\ufb01ts signi\ufb01cantly when the data is pre-processed with the transforma-\ntion step of PCA. The transformation step of any algorithm may be used in place of this step.\nCurrent research seeks to identify which algorithms work best with Manifold Sculpting to ef\ufb01-\nciently produce high quality results.\n(An implementation of Manifold Sculpting is included at\nhttp://waf\ufb02es.sourceforge.net.)\n\nReferences\n[1] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for\n\nnonlinear dimensionality reduction. Science, 290:2319\u20132323, 2000.\n\n[2] Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction by locally linear\n\nembedding. Science, 290:2323\u20132326, 2000.\n\n[3] Vin de Silva and Joshua B. Tenenbaum. Global versus local methods in nonlinear dimension-\n\nality reduction. In NIPS, pages 705\u2013712, 2002.\n\n[4] Bernhard Sch\u00a8olkopf, Alexander J. Smola, and Klaus-Robert M\u00a8uller. Kernel principal compo-\n\nnent analysis. Advances in kernel methods: support vector learning, pages 327\u2013352, 1999.\n\n[5] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embed-\nding and clustering. In Advances in Neural Information Processing Systems, 14, pages 585\u2013\n591, 2001.\n\n[6] Matthew Brand. Charting a manifold. In Advances in Neural Information Processing Systems,\n\n15, pages 961\u2013968. MIT Press, Cambridge, MA, 2003.\n\n[7] Pascal Vincent and Yoshua Bengio. Manifold parzen windows. In Advances in Neural Infor-\n\nmation Processing Systems 15, pages 825\u2013832. MIT Press, Cambridge, MA, 2003.\n\n[8] D. Donoho and C. Grimes. Hessian eigenmaps: locally linear embedding techniques for high\n\ndimensional data. Proc. of National Academy of Sciences, 100(10):5591\u20135596, 2003.\n\n[9] Yoshua Bengio and Martin Monperrus. Non-local manifold tangent learning.\n\nIn Advances\nin Neural Information Processing Systems 17, pages 129\u2013136. MIT Press, Cambridge, MA,\n2005.\n\n[10] Elizaveta Levina and Peter J. Bickel. Maximum likelihood estimation of intrinsic dimension.\n\nIn NIPS, 2004.\n\n[11] Zhenyue Zhang and Hongyuan Zha. A domain decomposition method for fast manifold learn-\ning. In Y. Weiss, B. Sch\u00a8olkopf, and J. Platt, editors, Advances in Neural Information Processing\nSystems 18. MIT Press, Cambridge, MA, 2006.\n\n[12] Sam Roweis. Em algorithms for PCA and SPCA. In Michael I. Jordan, Michael J. Kearns, and\nSara A. Solla, editors, Advances in Neural Information Processing Systems, volume 10, 1998.\n[13] Lawrence K. Saul and Sam T. Roweis. Think globally, \ufb01t locally: Unsupervised learning of\n\nlow dimensional manifolds. Journal of Machine Learning Research, 4:119\u2013155, 2003.\n\n8\n\n\f", "award": [], "sourceid": 690, "authors": [{"given_name": "Michael", "family_name": "Gashler", "institution": null}, {"given_name": "Dan", "family_name": "Ventura", "institution": null}, {"given_name": "Tony", "family_name": "Martinez", "institution": null}]}