NIPS 2016
Mon Dec 5th through Sun the 11th, 2016 at Centre Convencions Internacional Barcelona
Paper ID:56
Title:High-Rank Matrix Completion and Clustering under Self-Expressive Models

Reviewer 1

Summary

The authors propose an algorithm for clustering high-dimensional data points coming from a union of low-dimensional subspaces in the setting of incomplete data, i.e., if some of the entries of the given data points are missing. Their algorithm is related to the sparse subspace clustering algorithm by Elhamifar and Vidal, but unlike their algorithm, solves a convex relaxation of a lifted formulation that corresponds to a group-sparse and low-rank recovery problem to determine the coefficients that define the similarity graph. The competitive performance of the algorithm is demonstrated in experiments with synthetic and real data. The experiments suggest that the algorithm beats the state-of-the-art for data matrices that are high-rank.

Qualitative Assessment

The paper is well-structured and well written and deals with a relevant subject. The authors' approach to tackle the completion and clustering simultaneously seems very interesting and looks promising in numerical experiments, but a theoretical analysis of the procedure is not provided. I feel that the good experimental results justify publication. In order to understand the authors' algorithm better, it would be helpful for the reader if clustering part of the procedure, which is based on spectral clustering, was explained more precisely. In the supplementary material about the column subset selection and completion via lifting (Part 3), there is a confusing typo: In the formulas (26), (27) and (28), the "$c_{i,1}x_i, \cdots, c_{i,N}x_i$" should be replaced by "$\alpha_{i,1},\cdots,\alpha_{i,N}$".

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)


Reviewer 2

Summary

An algorithm for joint learning of a similarity graph (that can be used for subsequent clustering) and missing data completion. It is an extension of the Sparse Subspace clustering algorithm (which addresses the learning of a similarity graph for clustering using constraints that make it favorable for each data point to belong to one of several low dim sub-spaces). The problem is formulated as a constrained optimization and several relaxations are used to arrive at a convex objective that can then be optimized. Synthetic experiments show the method is favorable for increased % missing data and for high rank data matrices as well as parameters such as the ambient space dimensionality and th e number of points in each subspace. Real data experiments show either competitive results for low rank or superior results for high rank data matrices.

Qualitative Assessment

In general the method seems useful and the experiments are solid. There are some questions around whether it can scale it for large data sets or even medium sized ones. Another point for more clarity is how difficult is it to set the optimization parameters \gamma and \lambda. There is some exploration of different ranges for synthetic data but as to trying to set it in real life usage, some more discussion is needed. The paper is clearly written and easy to follow. From the point of view of contribution, it is building heavily on the SSC framework, effectively introducing the missing data points aspect to the original algorithm. minor: line 25: loose -> lose eq (2): for completeness of notation please define cij, and its domain. \ro is used both in (11) as a functional form of the exact reconstruction constraint and agin in 226 onwards for % of missing entries should define what incoherent subspaces are - talked to 3 different mathematicians (in probability theory, 2 are profs) and were not familiar with the term so safe to say it is not a commonly known term and defining could help the average reader.

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)


Reviewer 3

Summary

This paper proposes a new algorithm for simultaneous clustering and completion of incomplete high-dimensional data. They consider a lifting technique and reformulate the application as a low-rank sparse optimization problem. Its basic idea is to introduce a larger matrix to remove the bi-linear factorization structure of original variables. Furthermore, they consider convex relaxation method to solve the problem. Extensive experiments on synthetic data and real problems have shown that the proposed method is effective.

Qualitative Assessment

strong points. -- A new simultaneous sparse subspace clustering and completion model is proposed. -- A new convex relaxation technique is consider. The lifting technique is novel. It is achieved by exploiting the specified structure of projection matrix. -- Experiments have show that the proposed algorithm outperforms existing algorithm, especially when the solution is of high rank. weak points. -- The author should include the computational complexity of the lifting algorithm. Moreover, the computational comparison of the comparing methods should be reported. -- Although the model in Eq(2) is reasonable, I think the author should explain the difference between Eq(2) and the standard SSC model in [27]. -- It is not known whether the convex relaxation is tight, especially when the resulting rank is very high. minor issue. -- It is better to 1-2 sentences to briefly introduce the basic idea of the lifting technique in the beginning of section 4.1.

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)


Reviewer 4

Summary

The paper presents an extension to the sparse subspace clustering (SSC) algorithm that allows to complete missing values while also recovering the sparse representation of data. With the sparse representation at hand, the clustering is performed in exactly the same way as in the original SSC algorithm. The authors show that the joint matrix completion and sparse learning may yield better clustering results than performing the two steps independently.

Qualitative Assessment

Major comments: The paper describes an interesting extension of the SSC algorithm to allow joint matrix completion and sparse representation. The mathematical formulation is straightforward and the main contribution seems to be in the way the problem is made computationally tractable. I did not find any issues with the technical approach and I think the contribution in that respect is sound. On the downside, I was not convinced of the practical importance of this framework. Yes, joint matrix completion and subspace clustering seems like an interesting theoretical problem to solve, but do we have a practical real-life example where the benefits of this framework would become evident, particularly compared to the approach where one performs matrix completion and clustering in sequence (especially since LRMC+SSC performed quite decently)? - The paper shows that the proposed approach outperforms competition in the matrix completion task, but the specific scenario under which one would be interested in matrix completion accuracy feels contrived. - Additionally, I think the "real" experiments presented in the paper are in fact simulated. The "real" experiments were performed on data that was generated by simulating particular distributions using realistic datasets as the basis. These are not experiments on realistic datasets that would justify the applicability of the approach in real life. Minor comments: 1. I found the notations paragraph(lines 82-98) somewhat confusing, primarily because it felt unnecessarily generic. For example: - why not state simply that \bar{y_i} is the original datapoint with missing values replaced by zeros. - why not describe U_{\Omega_j} in simple words. Also, matrices U and P are denoted as matrices of real values, while in reality they are matrices over {0,1}. 2. The font of the references section seems very small.

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)


Reviewer 5

Summary

The paper considers a data matrix in which each column is a combination (linear or affine) of only a few of the other columns. The primary objective is to find the clusters of columns such that the columns within a cluster are combinations of the columns within the same cluster. In addition, some of the columns can have missing entries. The paper gives a lifting-based algorithm that simultaneously completes the missing entries and does the clustering. Strength: There already exists many algorithms for the problem which reasonably do well when the data matrix is of low-rank. The existing algorithms do not work well on high-rank matrices. The proposed algorithm seem to work well on both low-rank and high-rank matrices, both from synthetic and real data. Weakness: The authors do not provide any theoretical understanding of the algorithm.

Qualitative Assessment

The paper seems to be well written. The proposed algorithm seems to work very all on the experimental setup, using both synthetic and real-world data. The contributions of the papers are enough to be considered for a poster presentation. The following concerns if addressed properly could raise to the level of oral presentation: 1. The paper does not provide an analysis on what type of data the algorithm work best and on what type of data the algorithm may not work well. 2. The first claimed contribution of the paper is that unlike other existing algorithms, the proposed algorithm does not take as many points or does not need apriori knowledge about dimensions of subspaces. It would have been better if there were some empirical justification about this. 3. It would be good to show some empirical evidence that the proposed algorithm works better for Column Subset Selection problem too, as claimed in the third contribution of the paper.

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)


Reviewer 6

Summary

This paper proposes an algorithm to address the issue of missing data in the subspace clustering problem. The idea is well explained and easy to follow. The heuristic in the first step is same with the original SSC, i.e., to represent a data point by a sparse linear combination of other data points. There are two sources of non-convexity: 1. L_0 norm in the objective. 2. The bilinear term in the constraint. To deal with the non-convexity , the author reformulates the problem and then relaxes it into a convex problem (e.g. the nuclear norm heuristic of low rank matrix.). The experimental results are given on both synthetic and real datasets.

Qualitative Assessment

The paper is well written and pleasant to read. Although the algorithm seems to have some novelty, I am unconvinced of this paper's contribution especially its theoretical analysis. First question, how many missing entries it can tolerate? It would be related to the dimension, number of data, angle between different subspaces and other factors. It would be interesting to provide such result and see whether the experiment result validates it or not even in the synthetic dataset. Second question, does it have performance guarantee of convex relaxations, i.e., the relaxation of L_0 norm in the objective and the rank-1 constraint. In previous work, such as "A geometric analysis of subspace clustering with outliers", the result of L_1 relaxation is well explained. I would like to see similar discussions and results.

Confidence in this Review

3-Expert (read the paper in detail, know the area, quite certain of my opinion)