NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 635 Selecting the independent coordinates of manifolds with large aspect ratios

### Reviewer 1

The authors propose a criterion and method for selecting independent diffusion coordinates to capture the structure of a manifold with a large aspect ratio. The ideas presented in the paper are original, and the paper is clearly written, well organized and scientifically sound. The theoretical background and new analysis are provided in a clear and well-written form. The authors provide sufficient information to allow reproducibility of the method. Simulations are provided to support the success of the method, furthermore, the method is compared to an alternative approach. The paper indeed addresses a real problem in manifold learning, and the proposed method might be used by others in the future. I have a few minor concerns: the authors do not relate to: "Non-Redundant Spectral Dimensionality Reduction", Michaeli et al. probably unintentionally. However, I believe that this method provides a true alternative to the proposed method and this should be addressed. -The choice of the kernel bandwidth ($\epsilon$) is not addressed, this parameter could dramatically affect the results. Moreover, in some cases, if $\epsilon$ is chosen as a diag matrix (i.e. different number for each coordinate), the aspect ratio problem could be fixed (see for example "Kernel Scaling for Manifold Learning and Classification"). To summarize, I think the paper should be accepted and hope that these minor changes could be easily addressed to improve this manuscript. Respond to rebuttal: The authors have addressed all my comments in the rebuttal, my opinion is unchanged, I think that the paper should be accepted with the appropriate edits included in the final version.

### Reviewer 2

The authors provide a novel solution to the problem first identified in [DTCK18], that of identifying a parsimonious subset of eigenvectors from a diffusion map embedding. From the perspective of differential geometry, the authors identify a new criterion for evaluating the independence of a set of eigenvectors and use this to identify suitable independent subsets of eigenvectors of the diffusion map. This area of manifold embedding is relatively understudied, and the solution by the authors seems elegant, improves on existing work, and is scalable to large datasets. The paper is also accompanied by an impressive number of experiments. Comments: 1. The authors claim that their method is robust to "noise present in real scientific data". However, it is hard to determine whether or not this is the case given the examples provided. An experiment on synthetic data with added noise would improve this claim. 2. Some of the figures in the main text were difficult to parse. It appears that in Figure 1a the y-axis is mislabeled and contains multiple overlaid plots. It is also difficult to assess the utility of the embeddings provided for the real datasets in Figure 3 as there is no ground truth geometry that we can reference. It would be useful to know how the coloring used for the Chloromethane dataset (or what the data actually is) and to have some more interpretation of the utility of the embedding. Typographical comments: 1. Line 98: "regreesion" -> regression 2. "Chloromethane" is misspelled in the Fig. 3 legend Based on the strength of the experimental results and theoretical interpretation of this problem I recommend accepting this paper. Update (Aug 11): Reading the other reviews and the author feedback, my opinion of this paper has not changed. I agree with author three that considering the problem of selecting the idea subspace for conditionality reduction (as opposed to the ideal subset of eigenvector) is an interesting problem and perhaps will yield interesting progress in the field. However, that does not detract from the significance of the problem considered in this work. The authors are thorough and their response to request for analysis of robustness to noise is satisfactory. I do not wish to revise my score.

### Reviewer 3

This paper studies the problem of selecting coordinates of a map into a high-dimensional Euclidean space (assumed to be a smooth embedding) to produce a smooth immersion into a lower-dimensional Euclidean space. As the original map is composed of the eigenfunctions of a Laplacian, the authors call this the Independent Eigencoordinate Selection problem. The main contribution of the paper is to design an objective function to encourage the projected map to be locally injective and a regularization term encouraging use of slowly-varying lower eigenvalues. The IES problem is naturally phrased as a subset selection problem given these choices. The paper does not focus on how to optimize this objective function; rather, the authors study the behavior of the exact solution (found via exhaustive search of small subsets) under changes in the regularization parameter as a *regularization path*. I would have liked to see more discussion of the particular objective function chosen. Section 5 of the paper states that in the limit of infinitely many samples, the objective function converges to a K-L divergence between two Riemannian volume forms, one of them a pullback and the other cooked up to rescale the pullback. It seems like this limit is intended to motivate the choice of objective function. In that case, it would have been helpful to introduce it earlier and to discuss it more: e.g, why is K-L between these two volume forms a good way to encourage local injectivity. More fundamentally, the IES problem chooses a composition of the original map with a very specific Euclidean projection: a projection along coordinate axes. Searching over subsets of coordinates seems hard in general (this paper mainly uses exhaustive search). Why is it better to search among subsets of the coordinates than to search over all projections, which would be more amenable to continuous optimization techniques (e.g. manifold optimization on the Grassmannian)? I found the section on the regularization path and choosing $\zeta$ hard to follow. It seems to use notation introduced in the supplementary material without referring to it. Some of the mathematical terminology and notation in the paper is non-standard. For example, the pullback of a metric is normally denoted $\phi^*g$, not $g_{*\phi}$. The paper refers to the pushforward of the metric, which is really the pullback by the inverse map, $(\phi^{-1})^*g$. Of course this only makes sense where the inverse is well-defined. Similarly, the classification of functional dependencies/knots and crossings may be standard in machine learning, but as far as I know mathematicians would call these failures of local injectivity and failures of injectivity, respectively. A map that is locally (infinitesimally) injective but not necessarily globally injective is an immersion. It would be helpful to use this standard term as this is what the paper is seeking. In Section 5, some notation is used without being introduced. For example, I do not see where $p$ is defined, nor what $\sigma_k(y)$ is. The jacobian determinant is defined as the volume of a matrix, which seems like a typo. The characterization of the regularization parameter $\zeta$ is inconsistent. For example, section 5 states that "a smaller value for the regularization term encourages the use of slow varying coordinate functions." In fact, increasing $\zeta$ should put more emphasis on low-frequency modes. The paper states that "The rescaling of $\zeta$ [in equation(10)] in comparison with equation (2) aims to make $\zeta$ adimensional." But it is also stated that the objective function $\mathfrak{L}$ from equation (2) converges to that in equation (10). In that case, the scaling should be consistent between the two equations. If adimensionality is desirable, why not aim for that in the original definition of the objective function?