{"title": "Charting a Manifold", "book": "Advances in Neural Information Processing Systems", "page_first": 985, "page_last": 992, "abstract": null, "full_text": "Charting a Manifold\n\nMatthew Brand\n\nMitsubishi Electric Research Labs\n\n201 Broadway, Cambridge MA 02139 USA\n\nwww.merl.com/people/brand/\n\nAbstract\n\nWe construct a nonlinear mapping from a high-dimensional sample space\nto a low-dimensional vector space, effectively recovering a Cartesian\ncoordinate system for the manifold from which the data is sampled.\nThe mapping preserves local geometric relations in the manifold and is\npseudo-invertible. We show how to estimate the intrinsic dimensionality\nof the manifold from samples, decompose the sample data into locally\nlinear low-dimensional patches, merge these patches into a single low-\ndimensional coordinate system, and compute forward and reverse map-\npings between the sample and coordinate spaces. The objective functions\nare convex and their solutions are given in closed form.\n\n1 Nonlinear dimensionality reduction (NLDR) by charting\n\nCharting is the problem of assigning a low-dimensional coordinate system to data points\nin a high-dimensional sample space. It is presumed that the data lies on or near a low-\ndimensional manifold embedded in the sample space, and that there exists a 1-to-1 smooth\nnonlinear transform between the manifold and a low-dimensional vector space. The data-\nmodeler\u2019s goal is to estimate smooth continuous mappings between the sample and co-\nordinate spaces. Often this analysis will shed light on the intrinsic variables of the data-\ngenerating phenomenon, for example, revealing perceptual or con\ufb01guration spaces.\n\nOur goal is to \ufb01nd a mapping\u2014expressed as a kernel-based mixture of linear projections\u2014\nthat minimizes information loss about the density and relative locations of sample points.\nThis constraint is expressed in a posterior that combines a standard gaussian mixture model\n(GMM) likelihood function with a prior that penalizes uncertainty due to inconsistent pro-\njections in the mixture. Section 3 develops a special case where this posterior is unimodal\nand maximizable in closed form, yielding a GMM whose covariances reveal a patchwork of\noverlapping locally linear subspaces that cover the manifold. Section 4 shows that for this\n(or any) GMM and a choice of reduced dimension d, there is a unique, closed-form solution\nfor a minimally distorting merger of the subspaces into a d-dimensional coordinate space,\nas well as an reverse mapping de\ufb01ning the surface of the manifold in the sample space.\nThe intrinsic dimensionality d of the data manifold can be estimated from the growth pro-\ncess of point-to-point distances. In analogy to differential geometry, we call the subspaces\n\u201ccharts\u201d and their merger the \u201cconnection.\u201d Section 5 considers example problems where\nthese methods are used to untie knots, unroll and untwist sheets, and visualize video data.\n1.1 Background\nTopology-neutral NLDR algorithms can be divided into those that compute mappings, and\n\n\fthose that directly compute low-dimensional embeddings. The \ufb01eld has its roots in map-\nping algorithms: DeMers and Cottrell [3] proposed using auto-encoding neural networks\nwith a hidden layer \u201c bottleneck,\u201d effectively casting dimensionality reduction as a com-\npression problem. Hastie de\ufb01ned principal curves [ 5] as nonparametric 1D curves that pass\nthrough the center of \u201c nearby\u201d data points. A rich literature has grown up around properly\nregularizing this approach and extending it to surfaces. Smola and colleagues [10] analyzed\nthe NLDR problem in the broader framework of regularized quantization methods.\n\nMore recent advances aim for embeddings: Gomes and Mojsilovic [4] treat manifold com-\npletion as an anisotropic diffusion problem, iteratively expanding points until they connect\nto their neighbors. The ISOMAP algorithm [12] represents remote distances as sums of a\ntrusted set of distances between immediate neighbors, then uses multidimensional scaling\nto compute a low-dimensional embedding that minimally distorts all distances. The locally\nlinear embedding algorithm (LLE) [9] represents each point as a weighted combination of\na trusted set of nearest neighbors, then computes a minimally distorting low-dimensional\nbarycentric embedding. They have complementary strengths: ISOMAP handles holes well\nbut can fail if the data hull is nonconvex [12]; and vice versa for LLE [9]. Both offer em-\nbeddings without mappings. It has been noted that trusted-set methods are vulnerable to\nnoise because they consider the subset of point-to-point relationships that has the lowest\nsignal-to-noise ratio; small changes to the trusted set can induce large changes in the set of\nconstraints on the embedding, making solutions unstable [1].\n\nIn a return to mapping, Roweis and colleagues [8] proposed global coordination\u2014 learning\na mixture of locally linear projections from sample to coordinate space. They constructed\na posterior that penalizes distortions in the mapping, and gave a expectation-maximization\n(EM) training rule. Innovative use of variational methods highlighted the dif\ufb01culty of even\nhill-climbing their multimodal posterior. Like [2, 7, 6, 8], the method we develop below is\na decomposition of the manifold into locally linear neighborhoods. It bears closest relation\nto global coordination [8], although by a different construction of the problem, we avoid\nhill-climbing a spiky posterior and instead develop a closed-form solution.\n\n2 Estimating locally linear scale and intrinsic dimensionality\nWe begin with matrix of sample points Y :\n= [y1; (cid:1) (cid:1) (cid:1) ;yN]; yn 2 RD populating a D-\ndimensional sample space, and a conjecture that these points are samples from a man-\nifold M of intrinsic dimensionality d < D. We seek a mapping onto a vector space\nG(Y) ! X :\n= [x1; (cid:1) (cid:1) (cid:1) ;xN]; xn 2 Rd and 1-to-1 reverse mapping G(cid:0)1(X) ! Y such that\nlocal relations between nearby points are preserved (this will be formalized below). The\nmap G should be non-catastrophic, that is, without folds: Parallel lines on the manifold in\nRD should map to continuous smooth non-intersecting curves in Rd. This guarantees that\nlinear operations on X such as interpolation will have reasonable analogues on Y.\nSmoothness means that at some scale r the mapping from a neighborhood on M to Rd is\neffectively linear. Consider a ball of radius r centered on a data point and containing n(r)\ndata points. The count n(r) grows as rd, but only at the locally linear scale; the grow rate\nis in\ufb02ated by isotropic noise at smaller scales and by embedding curvature at larger scales.\n:\nTo estimate r, we look at how the r-ball grows as points are added to it, tracking c(r)\n=\nd log n(r) log r. At noise scales, c(r) (cid:25) 1=D < 1=d, because noise has distributed points in\nall directions with equal probability. At the scale at which curvature becomes signi\ufb01cant,\nc(r) < 1=d, because the manifold is no longer perpendicular to the surface of the ball, so\nthe ball does not have to grow as fast to accommodate new points. At the locally linear\nscale, the process peaks at c(r) = 1=d, because points are distributed only in the directions\nof the manifold\u2019s local tangent space. The maximum of c(r) therefore gives an estimate\nof both the scale and the local dimensionality of the manifold (see \ufb01gure 1), provided that\nthe ball hasn\u2019t expanded to a manifold boundary\u2014 boundaries have lower dimension than\n\nd\n\n\fScale behavior of a 1D manifold in 2-space\n\nsamples\nnoise scale\nlocally linear scale\ncurvature scale\n\nPoint-count growth process on a 2D manifold in 3-space\n\nradial growth process\n1D hypothesis\n2D hypothesis\n3D hypothesis\n\n101\n\n100\n\nl\n\n)\ne\na\nc\ns\n \ng\no\nl\n(\n \ns\nu\nd\na\nr\n\ni\n\n2\n\n101\n\n102\n#points (log scale)\n\n103\n\nFigure 1: Point growth processes. LEFT: At the locally linear scale, the number of points\nin an r-ball grows as rd; at noise and curvature scales it grows faster. RIGHT: Using the\npoint-count growth process to \ufb01nd the intrinsic dimensionality of a 2 D manifold nonlinearly\nembedded in 3-space (see \ufb01gure 2). Lines of slope 1=3, 1=2, and 1 are \ufb01tted to sections of the\nlog r=log nr curve. For neighborhoods of radius r (cid:25) 1 with roughly n (cid:25) 10 points, the slope\npeaks at 1=2 indicating a dimensionality of d = 2. Below that, the data appears 3D because\nit is dominated by noise (except for n (cid:20) D points); above, the data appears >2D because of\nmanifold curvature. As the r-ball expands to cover the entire data-set the dimensionality\nappears to drop to 1 as the process begins to track the 1D edges of the 2D sheet.\n\nthe manifold. For low-dimensional manifolds such as sheets, the boundary submanifolds\n(edges and corners) are very small relative to the full manifold, so the boundary effect is\ntypically limited to a small rise in c(r) as r approaches the scale of the entire data set. In\npractice, our code simply expands an r-ball at every point and looks for the \ufb01rst peak in\nc(r), averaged over many nearby r-balls. One can estimate d and r globally or per-point.\n\n3 Charting the data\n\nIn the charting step we \ufb01nd a soft partitioning of the data into locally linear low-dimensional\nneighborhoods, as a prelude to computing the connection that gives the global low-\ndimensional embedding. To minimize information loss in the connection, we require that\nthe data points project into a subspace associated with each neighborhood with (1) minimal\nloss of local variance and (2) maximal agreement of the projections of nearby points into\nnearby neighborhoods. Criterion (1) is served by maximizing the likelihood function of a\nGaussian mixture model (GMM) density \ufb01tted to the data:\n\np(yij\u00b5;S)\n\n:\n= (cid:229)\n\nj p(yij\u00b5 j;S\n\nj) p j = (cid:229)\n\nj N (yi; \u00b5 j;S\n\nj) p j\n\n:\n\n(1)\n\nEach gaussian component de\ufb01nes a local neighborhood centered around \u00b5 j with axes de-\n\ufb01ned by the eigenvectors of S\nj. The amount of data variance along each axis is indicated\nby the eigenvalues of S\nj; if the data manifold is locally linear in the vicinity of the \u00b5 j, all\nbut the d dominant eigenvalues will be near-zero, implying that the associated eigenvec-\ntors constitute the optimal variance-preserving local coordinate system. To some degree\nlikelihood maximization will naturally realize this property: It requires that the GMM com-\nponents shrink in volume to \ufb01t the data as tightly as possible, which is best achieved by\npositioning the components so that they \u201c pancake\u201d onto locally \ufb02at collections of data-\npoints. However, this state of affairs is easily violated by degenerate (zero-variance) GMM\ncomponents or components \ufb01tted to overly small enough locales where the data density off\nthe manifold is comparable to density on the manifold (e.g., at the noise scale). Conse-\nquently a prior is needed.\n\n\fCriterion (2) implies that neighboring partitions should have dominant axes that span sim-\nilar subspaces, since disagreement (large subspace angles) would lead to inconsistent pro-\njections of a point and therefore uncertainty about its location in a low-dimensional co-\nordinate space. The principal insight is that criterion (2) is exactly the cost of coding the\nlocation of a point in one neighborhood when it is generated by another neighborhood\u2014 the\ncross-entropy between the gaussian models de\ufb01ning the two neighborhoods:\n\nD(N1kN2) = Z dy N (y; \u00b51;S 1)log\n\nN (y; \u00b51;S 1)\nN (y; \u00b52;S 2)\n\n= (log jS (cid:0)1\n1\n\nS 2j + trace(S (cid:0)1\n2\n\nS 1) + (\u00b52(cid:0)\u00b51)>S (cid:0)1\n\n2 (\u00b52(cid:0)\u00b51) (cid:0) D)=2:\n\n(2)\n\n:\n= exp[(cid:0)(cid:229)\n\nRoughly speaking, the terms in (2) measure differences in size, orientation, and position,\nrespectively, of two coordinate frames located at the means \u00b51; \u00b52 with axes speci\ufb01ed by\nthe eigenvectors of S 1;S 2. All three terms decline to zero as the overlap between the two\nframes is maximized. To maximize consistency between adjacent neighborhoods, we form\nthe prior p(\u00b5;S)\ni6= j mi(\u00b5 j)D(NikN j)], where mi(\u00b5 j) is a measure of co-locality.\nUnlike global coordination [8], we are not asking that the dominant axes in neighboring\ncharts are aligned\u2014 only that they span nearly the same subspace. This is a much easier\nobjective to satisfy, and it contains a useful special case where the posterior p(\u00b5;SjY)\ni p(yij\u00b5;S) p(\u00b5;S)\nis unimodal and can be maximized in closed form: Let us associate a\ngaussian neighborhood with each data-point, setting \u00b5i = yi; take all neighborhoods to be\na priori equally probable, setting pi = 1=N; and let the co-locality measure be determined\nfrom some local kernel. For example, in this paper we use mi(\u00b5 j) (cid:181) N (\u00b5 j; \u00b5i;s 2), with\nthe scale parameter s\nspecifying the expected size of a neighborhood on the manifold in\nsample space. A reasonable choice is s = r=2, so that 2erf(2) > 99:5% of the density of\nmi(\u00b5 j) is contained in the area around yi where the manifold is expected to be locally linear.\nWith uniform pi and \u00b5i; mi(\u00b5 j) and \ufb01xed, the MAP estimates of the GMM covariances are\n\ni = (cid:229)\n\nj\n\nmi(\u00b5 j)(cid:16)(y j (cid:0) \u00b5i)(y j (cid:0) \u00b5i)> + (\u00b5 j (cid:0) \u00b5i)(\u00b5 j (cid:0) \u00b5i)> + S\n\nj(cid:17)!,(cid:229)\n\nmi(\u00b5 j)\n\n:(3)\n\nj\n\ni is dependent on all other S\n\nNote that each covariance S\nj. The MAP estimators for all\ncovariances can be arranged into a set of fully constrained linear equations and solved ex-\nactly for their mutually optimal values. This key step brings nonlocal information about\nthe manifold\u2019s shape into the local description of each neighborhood, ensuring that ad-\njoining neighborhoods have similar covariances and small angles between their respective\nsubspaces. Even if a local subset of data points are dense in a direction perpendicular to\nthe manifold, the prior encourages the local chart to orient parallel to the manifold as part\nof a globally optimal solution, protecting against a pathology noted in [8]. Equation (3) is\neasily adapted to give a reduced number of charts and/or charts centered on local centroids.\n\n4 Connecting the charts\n\nWe now build a connection for set of charts speci\ufb01ed as an arbitrary nondegenerate GMM. A\nGMM gives a soft partitioning of the dataset into neighborhoods of mean \u00b5k and covariance\nS k. The optimal variance-preserving low-dimensional coordinate system for each neigh-\nborhood derives from its weighted principal component analysis, which is exactly speci\ufb01ed\nk S k with eigen-\nby the eigenvectors of its covariance matrix: Eigendecompose VkL\n:\nvalues in descending order on the diagonal of L\n= [Id;0]V>\nk be the operator\n:\nprojecting points into the kth local chart, such that local chart coordinate uki\n= Wk(yi (cid:0) \u00b5k)\nand Uk\nOur goal is to sew together all charts into a globally consistent low-dimensional coordinate\nsystem. For each chart there will be a low-dimensional af\ufb01ne transform Gk 2 R(d+1)(cid:2)d\n\n:\n= [uk1; (cid:1) (cid:1) (cid:1) ;ukN] holds the local coordinates of all points.\n\nk and let Wk\n\nkV>\n\n(cid:181)\n(cid:229)\nS\n\fthat projects Uk into the global coordinate space. Summing over all charts, the weighted\naverage of the projections of point yi into the low-dimensional vector space is\n\nG j(cid:20) W j(y (cid:0) \u00b5 j)\n\n1\n\n(cid:21) p jjy(y)\n\n= (cid:229)\n\ncxjy :\n\nj\n\n) dxijyi\n\n= (cid:229)\n:\n\nj\n\nG j(cid:20) u ji\n\n1 (cid:21) p jjy(yi);\n\n(4)\n\npkN (y; \u00b5k;S k); (cid:229)\n\nwhere pkjy(y) (cid:181)\nk pkjy(y) = 1 is the probability that chart k generates\npoint y. As pointed out in [8], if a point has nonzero probabilities in two charts, then there\nshould be af\ufb01ne transforms of those two charts that map the point to the same place in a\nglobal coordinate space. We set this up as a weighted least-squares problem:\n\nG :\n\n= [G1; (cid:1) (cid:1) (cid:1) ;GK] = arg min\nGk;G j\n\ni\n\npkjy(yi)p jjy(yi)(cid:13)(cid:13)(cid:13)(cid:13)Gk(cid:20) uki\n\n1 (cid:21)(cid:13)(cid:13)(cid:13)(cid:13)\n1 (cid:21) (cid:0) G j(cid:20) u ji\n\n2\n\nF\n\n:\n\n(5)\n\nEquation (5) generates a homogeneous set of equations that determines a solution up to an\naf\ufb01ne transform of G. There are two solution methods. First, let us temporarily anchor one\nneighborhood at the origin to \ufb01x this indeterminacy. This adds the constraint G1 = [I;0]>.\n:\n= [0; (cid:1) (cid:1) (cid:1) ;0;I;0; (cid:1) (cid:1) (cid:1) ;0]> with the identity ma-\nTo solve, de\ufb01ne indicator matrix Fk\n:\ntrix occupying the kth block, such that Gk = GFk.\n=\ndiag([pkjy(y1); (cid:1) (cid:1) (cid:1) ;pkjy(yN)]) record the per-point posteriors of chart k. The squared error\nof the connection is then a sum of of all patch-to-anchor and patch-to-patch inconsistencies:\n\nthe diagonal of Pk\n\nLet\n\n= (cid:229)\n:\n\nE\n\nSetting dE=dG = 0 and solving to minimize convex E gives\n\n+ (cid:229)\n\n2\n\nk \"(cid:13)(cid:13)(cid:13)(cid:13)(GUk (cid:0)(cid:20) U1\n0 (cid:21))PkP1(cid:13)(cid:13)(cid:13)(cid:13)\nj!U>\nk (cid:229)\nG> = (cid:229)\nk (cid:0) (cid:229)\n\nUkP2\n\nP2\n\nj6=k\n\nF\n\nk\n\nj6=k(cid:13)(cid:13)(GU j (cid:0) GUk)P jPk(cid:13)(cid:13)2\nj!(cid:0)1 (cid:229)\n\nUkP2\n\njU>\n\nkP2\n\nj6=k\n\nF#; Uk\n\n:\n\n1 (cid:21) :\n= Fk(cid:20) Uk\n\n(6)\n\nk\n\nkP2\n\nUkP2\n\n0 (cid:21)>! :\n1(cid:20) U1\nF = trace(GQQ>G>)(cid:17) ;\n\n(7)\n\n(8)\n\nWe now remove the dependence on a reference neighborhood G1 by rewriting equation 5,\n\nG = argmin\n\nj6=kk(GU j (cid:0) GUk)P jPkk2\n\nF = kGQk2\n\nG (cid:16)(cid:229)\nj6=k(cid:0)(cid:0)U j (cid:0) Uk(cid:1)P jPk(cid:1).\n\nwhere Q :\n= (cid:229)\nIf we require that GG> = I to prevent degenerate\nsolutions, then equation (8) is solved (up to rotation in coordinate space) by setting G> to\nthe eigenvectors associated with the smallest eigenvalues of QQ>. The eigenvectors can be\ncomputed ef\ufb01ciently without explicitly forming QQ>; other numerical ef\ufb01ciencies obtain\nby zeroing any vanishingly small probabilities in each Pk, yielding a sparse eigenproblem.\nA more interesting strategy is to numerically condition the problem by calculating the\ntrailing eigenvectors of QQ> + 1.\nIt can be shown that this maximizes the posterior\np(GjQ) (cid:181)\ne(cid:0)kGQk2\nF e(cid:0)kG1k, where the prior p(G) favors a mapping G\nwhose unit-norm rows are also zero-mean. This maximizes variance in each row of G\nand thereby spreads the projected points broadly and evenly over coordinate space.\n\np(QjG)p(G) (cid:181)\n\nThe solutions for MAP charts (equation (5)) and connection (equation (8)) can be applied\nto any well-\ufb01tted mixture of gaussians/factors 1/PCAs density model; thus large eigen-\nproblems can be avoided by connecting just a small number of charts that cover the data.\n\n1We thank reviewers for calling our attention to Teh & Roweis ([11]\u2014 in this volume), which\nshows how to connect a set of given local dimensionality reducers in a generalized eigenvalue prob-\nlem that is related to equation (8).\n\n(cid:229)\n\foriginal data\n\nembedding, XY view\n\nXYZ view\n\n)\nd\ne\nk\nn\n\ni\nl\n(\n \n\nt\n\na\na\nd\n\ni\n\nw\ne\nv\n \nY\nX\n\ni\n\nw\ne\nv\n \nZ\nX\n\nt\n\ns\nt\nr\na\nh\nc\n \nl\n\na\nc\no\n\ne\ns\nb\nu\ns\n \nm\no\nd\nn\na\nr\n\nl\n \nf\n\no\n\nLLE, n=5\n\nLLE, n=6\n\nLLE, n=7\n\nLLE, n=8\n\nLLE, n=9\n\nLLE, n=10\n\ncharting\n\nbest Isomap\n\nbest LLE\n\n(regularized)\n\ncharting\n\n(projection onto coordinate space)\n\nreconstruction\n\n(back\u2212projected coordinate grid)\n\nFigure 2: The twisted curl problem. LEFT: Comparison of charting, ISOMAP, & LLE.\n400 points are randomly sampled from the manifold with noise. Charting is the only\nmethod that recovers the original space without catastrophes (folding), albeit with some\nshear. RIGHT: The manifold is regularly sampled (with noise) to illustrate the forward\nand backward projections. Samples are shown linked into lines to help visualize the man-\nifold structure. Coordinate axes of a random selection of charts are shown as bold lines.\nConnecting subsets of charts such as this will also give good mappings. The upper right\nquadrant shows various LLE results. At bottom we show the charting solution and the\nreconstructed (back-projected) manifold, which smooths out the noise.\n\nOnce the connection is solved, equation (4) gives the forward projection of any point y\ndown into coordinate space. There are several numerically distinct candidates for the back-\nprojection: posterior mean, mode, or exact inverse. In general, there may not be a unique\nposterior mode and the exact inverse is not solvable in closed form (this is also true of [8]).\nNote that chart-wise projection de\ufb01nes a complementary density in coordinate space\n\npxjk(x) = N (x;Gk(cid:20) 0\n\n1 (cid:21) ;Gk(cid:20) [Id;0]L\n\nk[Id;0]>\n0\n\n0\n\n0 (cid:21)G>\n\nk ):\n\nLet p(yjx; k), used to map x into subspace k on the surface of the manifold, be a Dirac delta\nfunction whose mean is a linear function of x. Then the posterior mean back-projection is\nobtained by integrating out uncertainty over which chart generates x:\n\n(9)\n\n(10)\n\npkjx(x) \u00b5k + W>\n\nk (cid:18)Gk(cid:20) I\n\n0 (cid:21)(cid:19)+(cid:18)x (cid:0) Gk(cid:20) 0\n\n1 (cid:21)(cid:19)! ;\n\ncyjx = (cid:229)\n\nk\n\nwhere ((cid:1))+ denotes pseudo-inverse. In general, a back-projecting map should not recon-\nstruct the original points. Instead, equation (10) generates a surface that passes through the\nweighted average of the \u00b5i of all the neighborhoods in which yi has nonzero probability,\nmuch like a principal curve passes through the center of each local group of points.\n\n5 Experiments\n\nSynthetic examples: 400 2D points were randomly sampled from a 2D square and embed-\nded in 3D via a curl and twist, then contaminated with gaussian noise. Even if noiselessly\nsampled, this manifold cannot be \u201c unrolled\u201d without distortion. In addition, the outer curl\nis sampled much less densely than the inner curl. With an order of magnitude fewer points,\nhigher noise levels, no possibility of an isometric mapping, and uneven sampling, this is\narguably a much more challenging problem than the \u201c swiss roll\u201d and \u201c s-curve\u201d problems\nfeatured in [12, 9, 8, 1]. Figure 2LEFT contrasts the (unique) output of charting and the\nbest outputs obtained from ISOMAP and LLE (considering all neighborhood sizes between\n2 and 20 points). ISOMAP and LLE show catastrophic folding; we had to change LLE\u2019s\n\n\fa. data, xy view b. data, yz view c. local charts\n\nd. 2D embedding\n\ne. 1D embedding\n\ne\n\nt\n\ni\n\na\nn\nd\nr\no\nD\n1\n\n \n\nFigure 3: Untying a trefoil knot (\n) by charting. 900 noisy samples from a 3D-embedded\n1D manifold are shown as connected dots in front (a) and side (b) views. A subset of charts\nis shown in (c). Solving for the 2D connection gives the \u201c unknot\u201d in (d). After removing\nsome points to cut the knot, charting gives a 1D embedding which we plot against true\nmanifold arc length in (e); monotonicity (modulo noise) indicates correctness.\n\ntrue manifold arc length\n\nThree principal degrees of freedom recovered from raw jittered images\n\npose\n\nscale\n\nexpression\n\nimages synthesized via backprojection of straight lines in coordinate space\nFigure 4: Modeling the manifold of facial images from raw video. Each row contains\nimages synthesized by back-projecting an axis-parallel straight line in coordinate space\nonto the manifold in image space. Blurry images correspond to points on the manifold\nwhose neighborhoods contain few if any nearby data points.\n\nregularization in order to coax out nondegenerate (>1D) solutions. Although charting is\nnot designed for isometry, after af\ufb01ne transform the forward-projected points disagree with\nthe original points with an RMS error of only 1.0429, lower than the best LLE (3.1423) or\nbest ISOMAP (1.1424, not shown). Figure 2RIGHT shows the same problem where points\nare sampled regularly from a grid, with noise added before and after embedding. Figure 3\nshows a similar treatment of a 1D line that was threaded into a 3D trefoil knot, contaminated\nwith gaussian noise, and then \u201c untied\u201d via charting.\nVideo: We obtained a 1965-frame video sequence (courtesy S. Roweis and B. Frey) of\n20 (cid:2) 28-pixel images in which B.F. strikes a variety of poses and expressions. The video\nis heavily contaminated with synthetic camera jitters. We used raw images, though image\nprocessing could have removed this and other uninteresting sources of variation. We took a\n500-frame subsequence and left-right mirrored it to obtain 1000 points in 20 (cid:2) 28 = 560D\nimage space. The point-growth process peaked just above d = 3 dimensions. We solved for\n25 charts, each centered on a random point, and a 3D connection. The recovered degrees\nof freedom\u2014 recognizable as pose, scale, and expression\u2014 are visualized in \ufb01gure4.\n\noriginal data\n\nstereographic map to 3D fishbowl\n\ncharting\n\nFigure 5: Flattening a \ufb01shbowl. From the left: Original 2000(cid:2)2 D points; their stereo-\ngraphic mapping to a 3D \ufb01shbowl; its 2 D embedding recovered using 500 charts; and the\nstereographic map. Fewer charts lead to isometric mappings that fold the bowl (not shown).\n\n\fConformality: Some manifolds can be \ufb02attened conformally (preserving local angles) but\nnot isometrically. Figure 5 shows that if the data is \ufb01nely charted, the connection behaves\nmore conformally than isometrically. This problem was suggested by J. Tenenbaum.\n\n6 Discussion\nCharting breaks kernel-based NLDR into two subproblems:\n(1) Finding a set of data-\ncovering locally linear neighborhoods (\u201c charts\u201d ) such that adjoining neighborhoods span\nmaximally similar subspaces, and (2) computing a minimal-distortion merger (\u201c connec-\ntion\u201d ) of all charts. The solution to (1) is optimal w.r.t. the estimated scale of local linearity\nr; the solution to (2) is optimal w.r.t. the solution to (1) and the desired dimensionality d.\nBoth problems have Bayesian settings. By of\ufb02oading the nonlinearity onto the kernels,\nwe obtain least-squares problems and closed form solutions. This scheme is also attractive\nbecause large eigenproblems can be avoided by using a reduced set of charts.\n\nThe dependence on r, like trusted-set methods, is a potential source of solution instabil-\nity. In practice the point-growth estimate seems fairly robust to data perturbations (to be\nexpected if the data density changes slowly over a manifold of integral Hausdorff dimen-\nsion), while the use of a soft neighborhood partitioning appears to make charting solutions\nreasonably stable to variations in r. Eigenvalue stability analyses may prove useful here.\nUltimately, we would prefer to integrate r out. In contrast, use of d appears to be a virtue:\nUnlike other eigenvector-based methods, the best d-dimensional embedding is not merely\na linear projection of the best d + 1-dimensional embedding; a unique distortion is found\nfor each value of d that maximizes the information content of its embedding.\n\nWhy does charting performs well on datasets where the signal-to-noise ratio confounds\nrecent state-of-the-art methods? Two reasons may be adduced: (1) Nonlocal information\nis used to construct both the system of local charts and their global connection. (2) The\nmapping only preserves the component of local point-to-point distances that project onto\nthe manifold; relationships perpendicular to the manifold are discarded. Thus charting uses\nglobal shape information to suppress noise in the constraints that determine the mapping.\n\nAcknowledgments\nThanks to J. Buhmann, S. Makar, S. Roweis, J. Tenenbaum, and anonymous reviewers for\ninsightful comments and suggested \u201c challenge\u201d problems.\n\nReferences\n[1] M. Balasubramanian and E. L. Schwartz. The IsoMap algorithm and topological stability. Sci-\n\nence, 295(5552):7, January 2002.\n\nNIPS\u20137, 1995.\n\n[2] C. Bregler and S. Omohundro. Nonlinear image interpolation using manifold learning.\n\nIn\n\n[3] D. DeMers and G. Cottrell. Nonlinear dimensionality reduction. In NIPS\u20135, 1993.\n[4] J. Gomes and A. Mojsilovic. A variational approach to recovering a manifold from sample\n\npoints. In ECCV, 2002.\n\n[5] T. Hastie and W. Stuetzle. Principal curves. J. Am. Statistical Assoc, 84(406):502\u2013516, 1989.\n[6] G. Hinton, P. Dayan, and M. Revow. Modeling the manifolds of handwritten digits.\n\nIEEE\n\n[7] N. Kambhatla and T. Leen. Dimensionality reduction by local principal component analysis.\n\nTrans. Neural Networks, 8, 1997.\n\nNeural Computation, 9, 1997.\n\n[8] S. Roweis, L. Saul, and G. Hinton. Global coordination of linear models. In NIPS\u201313, 2002.\n[9] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding.\n\nScience, 290:2323\u20132326, December 22 2000.\n\n[10] A. Smola, S. Mika, B. Sch\u00f6lkopf, and R. Williamson. Regularized principal manifolds. Ma-\n\nchine Learning, 1999.\n\n[11] Y. W. Teh and S. T. Roweis. Automatic alignment of hidden representations. In NIPS\u201315, 2003.\n[12] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear\n\ndimensionality reduction. Science, 290:2319\u20132323, December 22 2000.\n\n\f", "award": [], "sourceid": 2165, "authors": [{"given_name": "Matthew", "family_name": "Brand", "institution": null}]}