Paper ID: 780
Title: Matrix Completion from Fewer Entries: Spectral Detectability and Rank Estimation
Current Reviews

Submitted by Assigned_Reviewer_1

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper discusses rank detection in the framework of matrix completion (where exact reconstruction is not necessarily the aim). The authors present a new algorithm which is replacing the first two steps of OptSpace (for state-of-the-art low-rank matrix reconstruction) with another spectral procedure to detect hidden rank (and thereby structure).

The paper is well written overall. It describes the problem, prior art and the context. However, it would be useful to show the steps of OptSpace (not just use a cite) since the proposed algorithm is a close derivative.

This reviewer also would like to see a discussion on how the proposed first steps differ from trimming used by OptSpace. Consider Fig 1 on [3] and compare. The section 3.2 on spectral density makes steps towards that and is a well written section. However it is not well motivated and put into context for this reviewer. In short it would tremendously help the presentation if the authors would discuss how the spectral method replacing trimming of [3] differ exactly and contributes to the rank detection.

The paper has a good supporting numerical test section.

Further detailed comments:

69: The authors appear to use X^\dagger for the transpose of X (based on context). The NIPS audience would expect X^T. (Using X^\dagger for the transpose is especially confusing because occasionally it is used to denote the pseudo inverse.)

123: The definition of the Bethe Hessian is much different then in the cited [5] for Bethe Hessian. Show relationship if any to definition in [5]. Give some insight to the definition since it is not covered in the cited work. This reviewer have not heard about the Bethe Hessian before, and assumes the same will be true for a significant portion of the NIPS audience. Also note line 199: "matrix called Bethe Hessian in [5]". This is confusing and appears contradictory to the uninitiated reader.

128: "we will fixed"

132: Consider defining F(\beta) and then define \hat{\beta_{SG}}. (Side note on notation: why \hat and why subscript SG and most importantly why both?)

144: Computational cost of finding \beta_SG? Is the alternative involving eigencomputations of the Bethe Hessian equivalent to solving F(\beta) = 1? If so why? If not, why would we use the alternative that appears expensive?

234: Spectral density appears first in Fig 1. No context.

302: Section on computation of spectral density is not motivated. The reader wonders why are we doing this?
Q2: Please summarize your review in 1-2 sentences
The paper is well written and presents interesting material and recommended for acceptance assuming the detailed comments are addressed by the authors.

Submitted by Assigned_Reviewer_2

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper studies the problem of estimating the matrix rank explicitly before running a matrix completion methods. The paper proposes to use the Bethe Hessian matrix to estimate the rank. It also gives theoretical bound on how many entries do the algorithm need to estimate the rank. The paper empirically compares the proposed algorithm with optSpace.

The paper uses the random matrix setting. How will this assumption compared to the incoherence assumption in classical matrix completion paper?

It is better to give comparison on theoretical results with the optSpace
Q2: Please summarize your review in 1-2 sentences
The paper is an extension of the optSpace methods by calculating the rank explicitly. It contains solid theoretical results and empirical experiments.

Submitted by Assigned_Reviewer_3

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Summary: In this paper authors propose a new way to obtain estimate for singular vectors of an exactly low rank matrix from observing only few entries. This is useful in applications like matrix completion to initialize algorithms helping in faster recovery. The proposed algorithm (based on recent ideas of block model learning in sparse regime) needs less (in constants) samples than existing algorithm and has better sample complexity. Proposed algorithm also can estimate the rank of the matrix from samples better than simple thresholding techniques.

Major comments:

1. The paper considers random matrix model (each entry is iid Gaussian) which is much restrictive than the standard incoherence assumptions. It is not clear if the analysis can be carried forward in this case and if there will be improvement in sample complexity.

2. Alternating least squares (ALS) is perhaps the most popular algorithm in practice for matrix completion, which has been observed to recover the matrix from small number of samples in practice. Authors do not compare with this algorithm and missed some references http://arxiv.org/abs/1212.0467 and http://arxiv.org/abs/1312.0925 .

3. The title is misleading as authors only show improvements in sample complexity for initialization. It is not clear if matrix completion (i.e the second step of recovery) also can be done with fewer samples.

Minor comments:

1. Typo in line 127

2. Line 248: 'lines of X and Y' == columns of X and Y?
Q2: Please summarize your review in 1-2 sentences
This paper proposes a new way to approximate singular vectors of a matrix from few entries with better sample complexity using the ideas similar to backtracking matrix. However will this help in improving the sample complexity of the overall objective (matrix completion) is not clear.

Submitted by Assigned_Reviewer_4

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
A Bethe Hessian based method (MaCBetH) is proposed by the authors, which can estimate a finite rank reliably from fewer random entries and gives lower root-mean-square reconstruction errors. There is an improvement presented on OptSpace by detecting rank and providing better initial conditions for error minimization. The MaCBetH algorithm is based on eigenvalue decomposition of a sparse, symmetric matrix. Therefore, it is more efficient than the Hopfield model using the weighted non-backtracking matrix. The paper could be improved if more in-depth experimental discussion were provided. For example, not much discussions are provided for the results presented in Figure 4. Simply relying on the instruction by caption, it is very hard to understand why the method outperforms other methods.
Q2: Please summarize your review in 1-2 sentences
The authors propose a novel MaCBetH algorithm based on the Bethe Hessian matrix for rank estimation and spectral detectability.

Submitted by Assigned_Reviewer_5

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper proposes a method for estimating the rank of a low-rank incomplete data matrix, based on the so-called Beth free energy, which is an energy function used in statistical mechanics. The authors use the incomplete data matrix to define a graphical model on a bipartite graph, corresponding to the rows and columns of the data matrix. This graphical model is further studied by an approximation called the Beth free energy (BFE), whose phase transition is known in statistical mechanics: as a function of a certain parameter beta, the BFE has a single global optimum for small enough beta. As beta increases a few local minima appear, called the retrieval states. As beta keeps increasing a phase transition occurs in that many spurious local minima appear. The authors use a correspondence between statistical mechanics and the appearence of phase transitions in the graphical model, to estimate the rank of the underlying matrix as the number of retrieval states. This correspondence enables the authors to estimate the minimum number of observed entries per row required to correctly estimate the rank of the matrix. Experiments using synthetic data demonstrate the merit of the proposed method.

From reading the paper it is clear that the authors are very knowledgable as far as the Beth approximation is concerned, and their literature review on matrix completion is fair. In addition, their introduction up to the end of page 2 is well written and comprehensive. More importantly, the idea of using the Beth free energy for the purpose of rank-estimation of low-rank incomplete matrices seems original and as the synthetic experiments demonstrate, quite promising. I believe that the right way to judge this paper is as a work that attempts to bring together notions from machine learning (matrix completion) and notions from statistical mechanics (phase transition of Beth free energy). From such a viewpoint the paper fails, as the authors do not clearly establish the correspondence that they are using to translate the physics notions to the machine learning notions. Let me be precise. In the paper, there seems to be a length-2 chain of correspondences: Matrix Completion - Graphical Models - Statistical Mechanics. As far as the first correspondence is concerned, the authors give no motivation or intuition at all at writing down graphical model (6). What does this model mean in terms of matrix completion? Next, from graphical model (6) the authors pass to the Beth free energy (7), for which they claim that its phase transition is well known. But in line 182 they claim without further comment, perhaps the most important claim of this paper, that a good estimate for the rank of the matrix is the number of retrieval states of (7). Where does that come from? I believe the authors should very carefully justify this claim, as this seems to be the backbone of the paper. Next, at lines 194-197 they say that to study the retrieval states they look at the negative eigenvalues of the Hessian of the BFE at its global minimum (paramagnetic state). Again, how is this justified? At lines 266-269 the authors state an important characterization of retrieval states once again without justification.

Naturally, i do not expect the authors to reproduce a possibly large amount of results from statistical mechanics so that all arguments are self-contained and rigorous. But i do expect them to tie connections on the high-level, and in the low-level wherever possible, with the problem they are trying to solve, which is estimating the rank and more importantly, they should spend more space on establishing very clearly the correspondences that are being exploited.

This paper has the potential of becoming a good paper subject to the authors rewriting it in a more accessible way, and establishing carefully all important connections for their purpose between matrix completion and statistical mechanics.

As the paper stands, i can not recommend it for publication, since i believe that many readers would not even understand

its basic ideas.
Q2: Please summarize your review in 1-2 sentences
This paper has the potential of becoming a good paper subject to the authors rewriting it in a more accessible way, and establishing carefully all important connections for their purpose between matrix completion and statistical mechanics.

As the paper stands, i can not recommend it for publication, since i believe that many readers would not even understand

its basic ideas.

Author Feedback
Author Feedback
Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 5000 characters. Note however, that reviewers and area chairs are busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point.
We thank the reviewers for their appreciation of our work, and especially for the high quality of their reviews and the valuable feedback. It has helped us to identify where our presentation should be improved.

* Referee 1

- While trimming is a very good solution to remove problems due to high-degree nodes in spectral methods, recent developments in community detection have shown that an even better one (that does not erase information) is to use the non-backtracking matrix [8] which is equivalent to the Bethe Hessian [5]. Adapting this strategy in the context of completion is at the roots of our approach. Additionally, in practice, we observed that trimming makes a difference only in very large systems, while our approach is effective even at low sizes.

123: It is in fact the same matrix as in [5], rewritten using a trigonometric relation which we shall explicit.

144: The alternative is indeed more computationally expensive, but it provides more robust estimation of the \beta_SG than the simpler but approximate formula (5). However, some imprecision on the value of \beta_SG, does not seem to change noticeably the numerical experiments that we have perform (which is a good property).

234-302: We agree we should have put the computation in context and will rewrite this paragraph. Our main point is that, regardless of the motivation regarding the use of the Bethe Hessian given in Sec. 3, its spectral properties, that are crucial for understanding its performance, can be studied analytically.

* Referee 2

Random matrix setting assumption: This is definitely something that we need to clarify in the revised version. In a nutshell:
- Because the positions of the revealed entries are assumed random, we do NOT need iid distribution of the entries of X and Y. If, for instance, incoherence is present, then our analysis generalizes as long as the empirical distribution of the entries of the factors X and Y is well defined and non-degenerate (i.e. not a single delta function) in the large size limit.
- In the paper, for simplicity, we restrict our analysis to the case where the r vectors have the same, symmetric, empirical distribution. Note that the value the constant C(r) depends on this empirical distribution.

- We would gladly give a theoretical (and not only numerical) comparison with OptSpace, however, our analysis of Macbeth is not applicable to OptSpace and the theorems proven for OptSpace do not explicit (or overestimate) the constant C(r).

* Referee 3

1 see in answer to Referee 2.

2 We will add a discussion regarding ALS, as we should have done. Note however that extensive simulations made in https://stacks.stanford.edu/file/druid:qz136dw4490/thesis-augmented.pdf shows empirically that Optspace outperforms ALS in the cases we are interested in (see Fig. 6.1 and 6.2 in the link) while we observed empirically that macbeth outperforms Optspace.

3 Fig. 4 shows that we obtain much lower errors in reconstruction, and almost perfect ones, for lower observed fraction than Optspace. This, we believe, justifies our title.

* Referee 4

- We shall give a more detailed account of the experimental results in Fig. 4 in the text.

* Referee 7

We thank the referee for pointing out in detail places where the paper is not clear in the connections to statistical mechanics. This is very valuable to us and we will improve the presentation along the suggested lines. In particular the reasoning we will clarify in section 3 is:

(1) The use of the Hopfield model (6) is motivated by the fact that the Hebb rule takes the form of the low-rank decomposition of the matrix M, eq. (1). The original purpose of the Hopfield model is to memorize the r patterns as thermodynamic phases.
(2) We then explain the connection between number of phases and negative directions in the Hessian of the Bethe free energy. This then leads directly to the Macbeth spectral algorithm.

- The patterns (lines of matrices X and Y) correspond to the minima of the free energy function at low temperature. This follows from the original work of Hopfield himself.
- To identify these minima efficiently with a spectral method, we study first the free energy at large temperature, where there is only a unique trivial minimum, and look for the moment upon increasing \beta where this trivial minimum becomes unstable while the informative minima appear. At this point, the Hessian of the free energy has negative eigenvalues correlated with the informative minima (by the very definition of the Hessian). We shall add a figure to explain this construction.