Minimax Embeddings

Part of Advances in Neural Information Processing Systems 16 (NIPS 2003)

Bibtex Metadata Paper


Matthew Brand


Spectral methods for nonlinear dimensionality reduction (NLDR) impose a neighborhood graph on point data and compute eigenfunctions of a quadratic form generated from the graph. We introduce a more general and more robust formulation of NLDR based on the singular value de- composition (SVD). In this framework, most spectral NLDR principles can be recovered by taking a subset of the constraints in a quadratic form built from local nullspaces on the manifold. The minimax formulation also opens up an interesting class of methods in which the graph is “dec- orated” with information at the vertices, offering discrete or continuous maps, reduced computational complexity, and immunity to some solu- tion instabilities of eigenfunction approaches. Apropos, we show almost all NLDR methods based on eigenvalue decompositions (EVD) have a so- lution instability that increases faster than problem size. This pathology can be observed (and corrected via the minimax formulation) in problems as small as N < 100 points.

1 Nonlinear dimensionality reduction (NLDR) Spectral NLDR methods are graph embedding problems where a set of N points X .= [x1,··· ,xN] ∈ RD×N sampled from a low-dimensional manifold in a ambient space RD is reparameterized by imposing a neighborhood graph G on X and embedding the graph with minimal distortion in a “parameterization” space Rd, d < D. Typically the graph is sparse and local, with edges connecting points to their immediate neighbors. The embedding must keep these edges short or preserve their length (for isometry) or angles (for conformality). The graph-embedding problem was first introduced as a least-squares problem by Tutte [1], and as an eigenvalue problem by Fiedler [2]. The use of sparse graphs to generate metrics for least-squares problems has been studied intensely in the following three decades (see [3]). Modern NLDR methods use graph constraints to generate a metric in a space of embed- dings RN. Eigenvalue decomposition (EVD) gives the directions of least or greatest variance under this metric. Typically a subset of d extremal eigenvectors gives the embedding of N points in Rd parameterization space. This includes the IsoMap family [4], the locally linear embedding (LLE) family [5,6], and Laplacian methods [7,8]. Using similar methods, the Automatic Alignment [6] and Charting [9] algorithms embed local subspaces instead of points, and by combining subspace projections thus obtain continuous maps between RD and Rd. This paper introduces a general algebraic framework for computing optimal embeddings directly from graph constraints. The aforementioned methods can can be recovered as spe- cial cases. The framework also suggests some new methods with very attractive properties, including continuous maps, reduced computational complexity, and control over the degree

of conformality/isometry in the desired map. It also eliminates a solution instability that is intrinsic to EVD-based approaches. A perturbational analysis quantifies the instability.

2 Minimax theorem for graph embeddings

We begin with neighborhood graph specified by a nondiagonal weighted adjacency matrix M ∈ RN×N that has the data-reproducing property XM = X (this can be relaxed to XM ≈ X in practice). The graph-embedding and NLDR literatures offer various constructions of M, each appropriate to different sets of assumptions about the original embedding and its sampling X (e.g., isometry, local linearity, noiseless samples, regular sampling, etc.). Typically Mi j 6= 0 if points i, j are nearby on the intrinsic manifold and |Mi j| is small or zero otherwise. Each point is taken to be a linear or convex combination of its neighbors, and thus M specifies manifold connectivity in the sense that any nondegenerate embedding Y that satisfies YM ≈ Y with small residual kYM− YkF will preserve this connectivity and the structure of local neighborhoods. For example, in barycentric embeddings, each point is the average of its neighbors and thus Mi j = 1/k if vertex i is connected to vertex j (of degree k). We will also consider three optional constraints on the embedding :

  1. A null-space restriction, where the solution must be outside to the column-space of C ∈ RN×M, M < N. For example, it is common to stipulate that the solution Y be centered, i.e., YC = 0 for C = 1, the constant vector.

  2. A basis restriction, where the solution must be a linear combination of the rows of basis Z ∈ RK×N, K ≤ N. This can be thought of as information placed at the vertices of the graph that serves as example inputs for a target NLDR function. We will use this to construct dimension-reducing radial basis function networks.

  3. A metric S ∈ RN×N that determines how error is distributed over the points. For example, it might be important that boundary points have less error. We assume is symmetric positive definite and has factorization S = AA> (e.g., A could that S be a Cholesky factor of S).

In most settings, the optional matrices will default to the identity matrix. In this context, we define the per-dimension embedding error of row-vector yi ∈ rows(Y) to be