
Submitted by
Assigned_Reviewer_4
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The paper proposes a lowrank sparse subspace
clustering (LRSSC), which is a combination of SSC [7] and LRR
[15]. LRSSC minimizes a weighted sum of the tracenorm and the
l1norm of the representation matrix under the selfexpressiveness and
the zero diagonal constraints (Eq.(1)). Theoretical guarantees of
selfexpressiveness property (SEP) were obtained in deterministic
(Theorem 1) and the random (Theorem 2) setups. Graph connectivity of
LRR (not of LRSSC) is also discussed (Proposition 1).
I think this
is a nice paper with several strong points and a few minor weak points
(listed below). Although the idea of combining the tracenorm and
l1norm is not new (as mentioned in the paper), developing its
theoretical guarantees is a significant contribution. Furthermore,
the theoretical results not only show the advantage of the proposed LRSSC,
but also imply complementary nature of SSC and LRR. This is useful
information for researchers working on subspace clustering, and would
influence their future research.
Pros:  The proposed LRSSC is
given theoretical guarantees.  The theoretical and experimental
results imply strengths and weaknesses of SSC and LRR, and justify the
proposed combination. These insights are also useful for researchers
working on SSC and LRR.  A new definition (minimax subspace
incoherence property) of subspace incoherence is first introduced,
which led to tighter bounds for SSC than the previously obtained bounds
in [20]. This new definition might be used by other theoreticians for
developing better guarantees of related methods.
Cons:  The
idea to combine the trace and the l1 norms is not new.  The relation
between the obtained theoretical bounds and the experimental results
is not clear (see the first one in 'Other comments'
below).
Quality: The paper is technically sound and the claims
are supported by theory and experiment.
Clarity: The paper is
clearly written, and wellorganized.
Originality: Although the
combination of the l1norm and the trace norm has been used in a
different context [19] (as mentioned in the paper), developing
theoretical guarantees is a significant contribution. The differences
from the previous work are clearly stated.
Significance: The
paper provides important information. It theoretically and experimentally
shows the complementary nature of SSC and LRR, and justifies the
proposed combination. The information revealed in this paper on pros
and cons of SSC and LRR would influence researchers working on subspace
clustering when they use existing methods, develop new approaches, and
pursue deeper understanding of what sparsity and lowrankness of the
representation matrix mean for subspace clustering.
Other
comments:  It is not clear if the experiments in Section 6.1 supports
Theorem 2. It is preferable that the range in which Theorem 2
guarantees SEP is shown in Figure 1 if possible. If the whole
experimental condition is out of the guaranteed range, discussing how
loose the guarantees would be informative.  Since LRSSC with
lambda>infinity with the zero diagonal constraint is not the same
as LRR, it is informative to show the result with LRR without the zero
diagonal constraint in Figure 1. If it is indistinguishable with LRSSC
with lambda>infinity, please just say so.  In Remark 2, I do
not really agree that 'this seems to imply that the independent subspace
assumption used in [15, 16] to establish sufficient conditions for LRR to
work is unavoidable', since it has not been shown that SEP cannot be
achieved with LRR unless the subspaces are independent. If the authors
perhaps got some insights in the proof as said in Introduction, an
explanation on why the authors think so in the main text would be
helpful.
Q2: Please summarize your review in 12
sentences
A nice paper that proposes a combination (LRSSC)
of SSC and LRR, and gives its theoretical guarantees. The
theoretical and experimental results support the usefulness of LRSSC,
and imply complementary nature of SSC and LRR, which is useful
information for researchers working on subspace
clustering.
Submitted by
Assigned_Reviewer_5
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
SUMMARY: This paper proposes a new subspace clustering
algorithm called Low Rank Sparse Subspace Clustering (LRSSC) and aims to
study the conditions under which it is guaranteed to produce a correct
clustering. The correctness is defined in terms of two properties. The
selfexpressiveness property (SEP) captures whether a data point is
expressed as a linear combination of other points in the same subspace.
The graph connectivity property (GCP) captures whether the points in one
subspace form a connected component of the graph formed by all the data
points. The LRSSC algorithm builds on two existing subspace clustering
algorithms, SSC and LRR, which have complementary properties. The solution
of LRR is guaranteed to satisfy the SEP under the strong assumption of
independent subspaces and the GCP under weak assumptions (shown in this
paper). On the other hand, the solution of SSC is guaranteed to satisfy
the SEP under milder conditions, even with noisy data or data corrupted
with outliers, but the solution of SSC need not satisfy the GCP. This
paper combines the objective functions of both methods with the hope of
obtaining a method that satisfies both SEP and GEP for some range of
values of the relative weight between the two objective functions. Theorem
1 derives conditions under which LRSSC satisfies SEP in the deterministic
case. These conditions are natural generalizations of existing conditions
for SSC. But they are actually weaker than existing conditions. Theorem 2
derives conditions under which LRSSC satisfies SEP in the random case
(data drawn at random from randomly drawn subspaces). Overall, it is shown
that when the weight of the SSC term is large enough and the ratio of the
data dimension to the subspace dimension grows with the log of the number
of points, then LRSSC is guaranteed to satisfy SEP with high probability.
I say high, because it does not tend to 1. Finally, Proposition 1 and
Lemma 4 show that LRR satisfies GCP (presumably almost surely).
Experiments support that for a range of the SSC weight, LRSCC works.
Additional experiments on model selection show the usefulness of the
analysis.
QUALITY: This is an excellent paper. It addresses a
difficult and important problem. It derives new conditions that advance
the state of the art for the subspace clustering problem. For the first
time, easytoverify conditions for the case of nonindependent subspaces
are presented. The conditions are clean, intuitive and useful. Moreover,
for the first time an attempt to study both the SEP and the GCP properties
together is done. While the story is not complete yet, this is a clear
step in the right direction. That said, I do have some comments to improve
the paper. Most notably, the discussion in Section 4 needs improvement.
Also, there are a number of imprecisions in the literature review that
need to be corrected. Finally, the experiments on Hopkins 155 are not
entirely clear. My detailed comments are as follows:
 For data
drawn from a single subspace, I was expecting that graph connectivity
would be defined as second eigenvalue of the Laplacian being larger than
zero, where the Laplacian = Degree  Affinity, and Affinity = C+C^T.
Somehow, the authors limit themselves to showing that all the entries of C
are nonzero. It is probably trivial, but I would have preferred making the
connection with connectedness (based on the eigenvalue of the Laplacian)
more explicit.
 The whole paper is predicated on the idea that as
one varies lambda from 0 to infinity one goes from LRR to SSC. But this is
not the case, since for lambda = 0, the constraint diag(C) = 0 is not
present in standard LRR. Moreover, I don't think there is a value of
lambda for which one gets LRR. I was disappointed that this is not
mentioned explicitly till the very end of section 4. Can one write the
constraint as lambda diag(C) = 0 and the analysis follows?

Following on the point above, it was disappointing that no result on the
graph connectivity is shown for lambda > 0. In other words, the paper
does not shed light on graph connectivity for SSC or LRSSC. Only for LRR.
 The Low Rank Subspace Clustering (LRSC) work of Favaro deserves
a discussion. Equation in (6) is in fact an extension of LRSC, not of LRR,
because LRSC uses the Frobeniuos norm of the error, while LRR uses the L21
norm. Of course they coincide in the noiseless case. Arguably, the name
LRSC is more appropriate than LRR throughout, since we are not talking
about a low rank representation in general, but one specifically designed
for the subspace clustering problem.
 The proof of the success of
LRR for independent subspaces was not shown in Vidal et al. PAMI 2005
[23]. It was shown in Section 2.3 of Vidal et al. IJCV 2008, " Multiframe
motion segmentation with missing data using PowerFactorization and GPCA"
 To the best of my knowledge, the first application of subspace
clustering to hybrid system identification is Vidal et al. CDC 2003, which
is much earlier than Bako, Automatica 2011 [2].
 The experiments
on Figure 9 seem to be showing a performance that is worse than that of
SSC and LRR. As lambda goes to infinity, this should approximately
converge to the SSC performance, which is 1.52% for 2 motions, 4.40% for 3
motions, and 2.18% for two motions. The reported number of nearly 9% for 3
motions seems way off. I understand that the parameter setting may be
slightly different. But I wonder what happens if one uses the parameter
settings of Elhamifar. Would the behavior of getting a smaller error for
some lambda still hold in practice? Overall, I think the experiments needs
to be redone to be made comparable with the state of the art.

The quality of English decreases as a function of the page number. 
The articles "the" and "a" are missing in several places
(100,123,153,154,205,271,281)  It should be "a ddimensional
subspace" (085),  There should be a space before references
(040,041,048,221,353),  It should be "these subspaces, we say a
matrix C obeys the SEP" in line 100,  What is the \perp symbol in
115.  it should be "the only difference being the definition of mu"
in 179  It should be "Assume random sampling. If ..." in 207 
It should be "Combining Lemmas 1 and 2, we get" in line 229  It
should be "and random sampling. It holds" in 240  What is
probabilistic theorem? in line 245  It should be "ignoring constant
terms" or "ignoring the constant C" or something like that in 256 
The grammar of "The graph connectivity problem concerns when SEP is
satisfied, whether each block of the solution C to LRSSC represents a
connected graph" needs to be revised.  It should be "If the
subspaces are independent, X is not full rank, and the data points" in 274
 It should be "et al." not "et.al"  There is a "the the" and
"if if" somewhere in the paper
CLARITY: The paper very clearly
conveys the essence, originality and novelty of the contribution. But I
had read the previous papers on the subject, and in particular the papers
from Candes with the conditions for the deterministic case (inradius),
etc. I think that a reader unfamiliar with that paper may have
difficulties with definition 3 and the examples that follow. Now, the
technical details inside the proofs are of course harder to follow. One
minor thing is that the range of lambda values or lambda above a threshold
is mentioned earlier in the paper (e.g., 182), but such a range does not
appear till Remark 4, so the reader is left wondering what threshold is
the paper referring to. Another issue is that it is not clear what "vice
versa" refers to in Remark 4. Are you trying to say when lambda is smaller
than a threshold versus larger than a threshold?
ORIGINALITY: The
paper is highly original. The core contributions are to find weaker
conditions for the correctness of SSC, new conditions for the correctness
of LRR, and conditions for the correctness of a new method LRSSC. To the
best of my knowledge, all the material presented in the paper is new.
IMPACT: Subspace clustering is receiving increased attention in
the community, and sparse and low rank methods are showing excellent
performance in many datasets. Understanding why these methods work is in
my view fundamental. This paper presents a fundamental step towards
achieving this understanding.
Q2: Please summarize
your review in 12 sentences
This paper proposes a new subspace clustering
algorithm called Low Rank Sparse Subspace Clustering (LRSSC) and aims to
study the conditions under which it is guaranteed to produce a correct
clustering. The presented conditions are novel. This paper presents a
fundamental step towards understanding why sparse and lowrank methods are
appropriate for subspace clustering. Submitted by
Assigned_Reviewer_7
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This paper proposes an algorithm for subspace
clustering that combines nuclearnorm and L1 minimization. Theoretical
guarantees show that this algorithm can succeed when the data satisfy
incoherence assumptions that are weaker than in previous work. The
algorithm was implemented using the ADMM method, and run on large data
sets. Numerical experiments show that the algorithm can be tuned to trade
off between the competing goals of separating points in different
subspaces while maintaining graph connectivity between points in the same
subspace.
Quality: The results are interesting and the proofs are
quite substantial. The proofs use the same convex duality approach as in
previous work on sparse subspace clustering (ref. 20), but are more
sophisticated due to the use of nuclearnorm minimization (reminiscent of
work on matrix completion).
Clarity: The paper is fairly clear,
but some notation is not adequately explained: in line 91, note that C
is the elementwise (not operator) absolute value; in line 113, the
notation \set{\Lambda_1(X)} is very awkward; in line 114, the L_infinity
norm and \diag^\perp notation need to be explained; in line 116, \span(X)
should be the column space of X; in line 225, what is the parameter d?
Also, the supplementary material has a number of confusing points:
in line 84, it would be helpful to define \supp(C), for readers not
familiar with results of this type; in line 147, one should state the
particular choice of \tilde{\Omega} that is needed here, and also fix the
grammar in that sentence; in line 244, it is not clear what the
"entrywise box constraint \lambda" refers to, though I suspect it is
constraint (5) in Lemma A.1.2; in sections A.3 and A.4, it is quite
difficult to follow the logical structure of the argument, and better
explanation is needed.
Originality: This work seems fairly
original, in particular the notion of "minimax subspace incoherence," and
the use of a singular value condition to quantify how well each subspace
is represented by the observed data points.
Significance: This
work seems quite significant. The proposed algorithm is especially
interesting due to the tradeoffs between the two goals of "separation"
and "graph connectivity." Q2: Please summarize your
review in 12 sentences
This paper has interesting results on a problem that
is technically difficult, both in theory and in practice.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 6000 characters. Note
however that reviewers and area chairs are very busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
First of all, we thank the reviewers for the
highquality comments and thoughtful input. We will revise the paper on
notaion, grammar, presentation and etc. Also, we will include the
references we missed out and provide short discussions on those
particularly relevant.We address other comments briefly below.
To
Reviewer 4:
1. About how the experiment in Figure 1 supports
Theorem 2: The results shown in Figure 1 shows clearly that SEP holds for
all lambda > some threshold, which is also what is predicted in Theorem
2. Though the result is only a sufficient condition, the experimental
threshold is of the same order of what is predicted by Theorem 2. We will
add the predicted value in the plots to further illustrate this.
2. About Figure 1 for LRR without the diag(C)=0: The experiments
that produce Figure 1 is for disjoint subspaces that has total dimension
exceeding the ambient dimension. In such a case, LRR without the diag(C)=0
constraint will return a trivial solution C=identity.
3. About
our "seems to imply" statement of LRR in Remark 2: This is not a statement
with formal proof, but we do have something in the proof of Theorem 1
suggesting that LRR requires independent assumption except very specific
instances (we believe it is possible to formally show this for random
generated data, but it seems out of the scope of the paper). Moreover, the
synthetic data experiments support the conjecture.
To Reviewer
5:
4. About the definition for second eigenvalue of the Laplacian:
Full connectivity implies the graph is a single connected body. We think
this definition is more basic than the equivalent definition on the
eigenvalue of the Lapalcian. In addition, the second eigenvalue of the
Laplacian is difficult to analyze.
5. About lambda diag(C)=0
constraint: This is a good idea, but for the analysis, lambda>0 does
not reduce to LRR without the diagonal constraints, unless lambda actually
reaches 0. In fact, we think the constraint on the diagonal entries is
natural in the problem of subspace clustering, since representing one data
point fully or partially with itself is not meaningful in constructing the
affinity graph. Without this constraint, the solution of LRSSC is trivial
(identity) in disjoint subspace settings (e.g., that in Figure 1).
6. About our connectivity result: It is a very preliminary
(and simple) result. We understand the disappointment that it is for LRR
only, but not LRSSC. We hope to address this and obtain stronger and more
general graph connectivity result for subspace clustering in a future
work.
7. About the experiments on Hopkins155: We will study
Elhamifar's code, redo the experiment and try to make the results
comparable. The results on LRR end being worse than LRR is however
understandable since for fair comparison we did not use any post
processing of the affinity graph.
8. About the clarity in Remark
4: The threshold in Remark 4 is different from else where in the paper. In
Remark 4, the threshold refers to the range when (2) is stronger than (3).
The "threshold" in other parts of the paper refers to the range of lambda
that results in a solution satisfying SEP. This can be seen from either
(2) or (3). We will make the statement clearer.
To Reviewer 7:
9. Line A.244: Yes, the entrywise box constraint refers to (5) in
Lemma A.1.2. We will make the statement clearer.
10. Line 255: d
is the dimension of each subspace.
 