
Submitted by
Assigned_Reviewer_1
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The authors present an algorithm for matrix
factorization with binary components; in this particular variant the
authors have a number of combinatorial (basically zeroone) constraints on
the elements of one of the two factor matrices.
The paper has
great theory, that was enjoyable to read. The use of the LittlewoodOfford
lemma is very elegant (even though it only holds for asymptotically large
values of m). The use of linear programming to further speed up the
overall algorithms is also a nice approach. Overall, this is very strong
paper from a theoretical perspective, certainly above the NIPS threshold.
My main concern, however, had to do with the motivation of this
particular factorization. While intuitively appealing (having a zeroone
matrix on the left hand size of a matrix factorization), it was not
obvious to me what applications this could have. The case study described
by the authors is certainly interesting, but somewhat exotic: to the best
of my knowledge, associations between cancers and shifts in methylation of
cell types are still vastly unexplored. I would like to see more
motivation for the proposed factorization, if possible.
Overall, a
solid paper that I would like to see published in
NIPS. Q2: Please summarize your review in 12
sentences
The authors present algorithms for a matrix
factorization with zeroone constraints on one of the two factor matrices.
A very good paper from a theoretical perspective; however, the overall
motivation for this factorization is rather weak. Submitted by
Assigned_Reviewer_3
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The paper presents an algorithm for lowrank matrix
factorization with constraints on one of the factors should be binary. The
paper has several novel contributions for this problem. The algorithm
guarantees the exact solution with the time complexity of O(mr2^r+mnr),
where previous approach (E. Meeds et al., NIPS 2007) uses MCMC algorithm
so that it cannot guarantee a global convergence. Under additional
assumptions on the binary factor matrix T, the uniqueness of T is proved
which means that each data point has a unique representation with the
columns of T. Using LittlewoodOfford lemma, the paper computes a
theoretical speedup factor for their heuristic of the candidate binary
vector set reduction step. I summarize the strengths (+) and
weaknesses () of this paper.
Quality (+) The proposed algorithm
exhaustively searches all candidates of binary vectors whose time
complexity is only exponential in the rank of the matrix by its
algorithmic procedure, and later a heuristic of the candidate set
reduction is suggested. Analysis of uniqueness and time complexity of the
algorithm is appropriate to support the algorithm. () I’m not sure this
theoretically wellsupported algorithm actually works better than the
previous approach (E. Meeds et al., NIPS 2007), since the paper doesn’t
provide empirical comparison between them. Is this comparison is possible?
As a minor comment about the algorithm, is there any reason to use the
vector p in each algorithms? It seems that this vector does not contribute
to the proof of theories.
Clarity (+) It is easy to follow the
overall procedure of the algorithm and its theoretical justification. ()
In my opinion, the statements about a nonnegative variant in the section
2.3 is obvious and unnecessary, and seems to be not connected with other
contents.
Originality (+) This is a novel contribution for matrix
factorization. The algorithm is completely different from the previous
one, and guarantees the uniqueness of the solution. Important references
about matrix factorization with binary matrices are included in the paper.
Significance: (+) Uniqueness of the solution may be useful for
many problems whose representation of each data points should be unique,
e. g. DNA sequence analysis, even if in most case, data has noise so that
this uniqueness property for the exact case does not satisfy. The paper
provides a unique theoretical behavior of matrix factorization with one
binary factor matrix. () As I mentioned earlier, empirical comparison
with the existing work is not provided, so I’m not sure the proposed
algorithm is better with realworld data. But this comparison is not
necessary if it is not possible, since the proposed algorithm is better
enough than the existing work with only its theoretical analysis about the
uniqueness of the solution.
Q2: Please summarize
your review in 12 sentences
This paper proposes a matrix factorization method with
constraints on one of the factors should be binary whose time complexity
is only exponential in the rank of the matrix. Theoretical analysis about
the uniqueness of the solution is a significant contribution for this
problem, and it would be better to add an empirical comparison with the
existing work. Submitted by
Assigned_Reviewer_5
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This paper discusses a new approach to binary matrix
factorization that is motivated by recent developments in nonnegative
matrix factorization. The paper is well written and very clear. The
goal of the paper is to present an algorithm for finding a
factorization of a matrix in the form $D = T A$ where the entries of
$T$ are in $\{0,1\}$. Such a model has wide applicability and is of
interest to the ML community. The algorithm has provable recovery
guarantees in the case of noiseless observations. A modified algorithm
is applied to the noisy setting; however, the authors do not establish
recovery guarantees.
Can the authors provide comparisons of
their method in the noisy setting to existing methods for solving
matrix factorization problems: for example the method proposed by
Bittorf et. al. or EM (that is if we assume the latent binary factors
are independent)? Such a comparison could serve to highlight some of
the advantages of the proposed method in the noisy setting versus
existing approaches. Those methods have much better computational
performance. Can the authors also discuss under what circumstances
they expect their method to perform better than the linear programming
(LP) based method (which also has theoretical guarantees)? The method
proposed in this report exploits the information that the matrix $T$
is binary; however, the LP based methods might also perform well in
such contexts.
What is the usefullness of the discussion on
uniqueness and nonnegative A? From my reading of Proposition 2, if
problem (1) has a unique solution $(T^*,A^*)$ and if those unique
solutions satisfy $A^* \geq 0$, then Algorithm 1 will return $T^*$ and
$A^*$. Doesn't Proposition (2) simply follow from Proposition (1) if
both equations (6) and (7) are satisfied? If the point is that simple,
I do not see the need to discuss it in such detail. Otherwise, could
the authors please clarify that point.
Q2: Please summarize your review in 12
sentences
Overall the paper is interesting and well written.
However, adding deeper comparisons to the existing work in matrix
factorizations and NMP will add better insight into the behavior of
the algorithm and the main contributions of this article.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 6000 characters. Note
however that reviewers and area chairs are very busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We thank all reviewers for their constructive and
helpful comments.
R1: 'There is little motivation for the
particular factorization discussed in the paper. The case study, an
application to DNA methylation array data, is somewhat exotic.'
Analysis of methylation array data has in fact received little
attention outside the biomedical literature, where it is an important,
emerging topic (pubmed yields more than 15000 hits for “DNA methylation
AND cancer”). Advanced analysis of this kind of data is still in its
infancy, and we feel that the matrix factorization of the paper offers a
valuable contribution in this regard. Apart from that, the
factorization can potentially be applied to other problems in biology,
where “on/off phenomena” play a role (e.g. transcription factor binding,
presence/absence of SNPs), as well as to blind source separation as
mentioned in our introduction along with supporting references.
R2/R3: 'To better judge the performance of the proposed
method, it would be useful to include comparisons against existing
approaches to the problem such as [Meeds et al., NIPS 2007] and [Bittdorf
et al., NIPS 2012]'
We have meanwhile conducted an experimental
comparison to the LP approach of Bittdorf et al. in the separable setting
with binary T. We have found that our approach is more robust to noise,
and we plan to add these results in the final version. The model of
Meeds et al. involves two binary factors in a threefactor factorization
and is hence different from our factorization model. A comparison can
still be performed when running our approach in a twostep manner.
Provided that code can be obtained from Meeds et al, such a comparison
would be included in the final version. Please note that the paper
already contains a comparison to two methods based on alternating
optimization (a standard approach to NMF), adapted to our specific
factorization problem.
R2: 'It is worth discussing in which
settings the authors' method performs better than the LP method of
Bittdorf et al.'
The LP method does not require the left factor T
be binary, and it has no more than polynomial complexity in the rank r of
the factorization. On the other hand, (approximate) separability is
crucial to the LP approach – all theoretical results are limited to this
specific setting (even though one may argue that separability is satisfied
in major applications of NMF). Moreover, in the noisy case, the LP
approach needs suitable specification of a second tuning parameter
(depending on the noise level) besides the rank r. We plan to include
these points along with the results of the empirical comparison in the
final version.
R2/R3: 'The statement (Proposition 2) about a
variant with nonnegative A appears to be unnecessary and obvious in light
of Proposition 1 on the general case'
We agree that the statement
is obvious. Nevertheless, we believe that it deserves to be stated
(possibly more adequately as corollary to Proposition 1 in a revised
version), because 1) a modification to our algorithmic approach introduced
earlier for general A would be necessary without uniqueness 2) the
application presented in the last section requires nonnegative A.
R2: 'Is there any reason to use the vector p in the
algorithms/theory ? It seems that this vector is not essential.'
The vector p in aff(D) is necessary as we work in the affine hull
of D instead of its linear hull. However, similar results would be
possible for the linear hull (dropping the sumtoone constraints on A),
in which case one would omit p.
 