
Submitted by Assigned_Reviewer_1
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The authors propose a method for incorporating underlying graph structure among users/items in a collaborative filtering setting. The novelty of this method is that it is the first method to both incorporate graph structure and deal with sparse data.
The method itself is fairly straightforward and clearly explained. It amounts to blockcoordinate ascent, solving a convex problem at each iteration. One potential methodological contribution is to solve for the the lowrank factors using conjugategradient methods to avoid slow matrix decompositions, but it's unclear whether this is the first time such an approach was proposed.
Connections are drawn to the nuclear norm minimization of Srebro and Salakhutdinov (2010), and the authors show that their method is equivalent to employing a weighted atomic norm, where the weights are derived from the graph structure. This connection allows them to derive a consistency bound on recovering the true lowrank matrix. The bound is specified in terms of a quantity \alpha_^* which is shown empirically to be much smaller for realistic matrices than the corresponding bound for matrix completion.
Finally, the method is validated on three real datasets and is shown to outperform standard matrix completion in that it is able to achieve a given RMSE in much less time.
The work is of high quality and clearly explained. I amn less confident of its originality but if this is indeed the first time the conjugategradient method embedded in the coordinateascent scheme was emplyed in a collaborative filtering setting, then this is a reasonably novel contribution. Overall, this seems to be a signficiant advancement in how further exploiting known structure in features to improve collaborative filtering models.
Q2: Please summarize your review in 12 sentences
This paper represents a substantial contribution to the growing body of work around leveraging underlying structure among users/items for collaborative filtering. The paper is wellwritten and the contributions are both theoretical (consistency bounds in terms of a quantity that is empirically verified to be tighter) and practical in nature (faster method by orders of magnitude, superior performance on real data sets).
Submitted by Assigned_Reviewer_2
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Paper 1260 tackles the problem of matrix completion with side information represented as a graph. The authors claim that this is a general framework in that the side information can come from many different kinds of sources. Major contributions of this paper are scalable learning algorithm for matrix completition with graph structures, connecting it to existing works in matrix factorization literature, and verifying the proposed method theoretically as well as experimentally.
As the authors claim, this problem is widely applicable such as recommendation systems with social network information. Considering the size of such dataset, it is important to bulid a scalable algorithm as the authors proposed.
This paper is wellstructured and wellwritten. It is good to have both theoretical analysis of consistency and experiments with real data. Also, it is useful to connect the proposed method in the context of wider convex optimization and noisy matrix approximation.
Regarding the experiment (Section 6): some baselines in Table 1 are too simple; no one might be interested in comparison with Global/User/Movie mean. I suggest comparing against more recent, stateoftheart methods such as
Salakhutdinov and Mnih, Probabilistic Matrix Factorization [NIPS 2008], Mackey et al., DivideandConquer Matrix Factorization [NIPS 2011], Lee et al., Local LowRank Matrix Approximation [ICML 2013].
It is impressive that the proposed method runs in O(10^3) seconds with dataset in Table 2. However, to claim scalability, I recommend running the algorithm on larger dataset, such as Netflix or Yahoo Music. This way it is also possible comparing RMSE score with other methods as well.
Minor comments: 1) Why Figure 1 and Figure 2 have different format of xaxis? I recommend using format of Figure 2 (10^1, 10^2, ...) instead of plotting log(time) directly. Logscale without specifying base is ambiguous. 2) Citations are not in NIPS format.
Q2: Please summarize your review in 12 sentences
This paper presents GRALS, an efficient way of optimizing graphregularized matrix completion problem. With some minor issues with experiments, I see this paper is generally wellwritten and clear.
Submitted by Assigned_Reviewer_3
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper considers matrix completion in a setting where the row and column variables lie on graphs. The authors
develop a scalable alternating least squares algorithm. Further, they show that the regularizer in the optimization problem can be seen as a generalized form of weighted nuclear norm, and derive statistical consistency guarantees for the low rank matrix estimators. Experiments comparing to leading methods on a movie ratings data set shows their method achieving lowest RMSE. Further, their ALS algorithm is shown to scale orders of magnitude better than SGD.
a. The description of how the row/column graphs are generated from the movielens dataset is vague in the paper; please clarify.
b. Sections 5 and 5.1 were difficult to follow. Terms such as "spikiness" are not defined, but it's key to following the main theoretical result and the comparison to standard matrix completion.
c. why is there no RMSE table for the 3 large datasets? from Fig.2 it's unclear if RMSE of GRALS is equal/poorer than the other methods (it's clear that GRALS scales better)
Q2: Please summarize your review in 12 sentences
The optimization problem considered in this paper  graphstructed matrix factorization with partial observations  appears to be novel and is likely to be of significant interest to the collaborative filtering community. The solution is based on weighted norm minimization (similar to work in Srebro and Salakhutdinov (2010)). The alternating least squares algorithm developed by the authors is convincingly shown to be much more efficient than SGD, and performs better than leading methods that include side information. However, I found the writing in key sections difficult to follow, and I haven't checked proofs.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 5000 characters. Note
however, that reviewers and area chairs are busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We thank the reviewers for their careful
feedback and kind comments. We respond to the comments by each reviewer in
detail below:
Reviewer_1:
Regarding our optimization
innovations: In addition to what was noted by the reviewer, another
major contribution of our work is the efficient Hessian vector
multiplication schemes (Algorithms 1 and 2) that enable the use of
conjugate gradient methods to be applied in the collaborative filtering
setting. The Hessian vector multiplication methods coupled with conjugate
gradient makes GRALS highly efficient, much more so than just a direct
application of conjugate gradient schemes.
Reviewer_2:
1. Regarding lack of comparison to the 3
suggested papers: Salakhutdinov and Mnih, Probabilistic Matrix
Factorization [NIPS 2008], Mackey et al., DivideandConquer Matrix
Factorization [NIPS 2011], Lee et al., Local LowRank Matrix Approximation
[ICML 2013].
The SGD method we compare to in the paper was
proposed in Zhou et al. 2012, which extends the work of Salakhutdinov and
Minh to the kernel setting. Section 4.2 in Zhou et al shows that their
method generalizes PMF, and hence we believed that a separate comparison
to the latter was unnecessary.
The methods of Mackey et al, and
Lee et al solve the standard matrix factorization problem, and do not take
into account additional graph information among variables, and hence we
did not compare to these methods in Figure 2. We will be happy to include
RMSE results from these methods (and some others) in the final version of
the paper.
2. Regarding the suggestion to also run the algorithm
on Netflix or Yahoo Music datasets:
The Netflix dataset only
includes the year of release as side information for movies, and no user
information. While the Yahoo! music data does not have a graph over users,
it does have features provided for the users, in much the same way as
movielens, so that we could construct a knn graph from these features. We
will be happy to do so, and provide a comparison in the final version of
the paper.
3. We thank the reviewer for the suggestions regarding
citations and axis labels, and will make the changes in the final version.
Reviewer_3:
1. Regarding the terse description of how
the row/column graphs are generated from the movielens dataset:
We
will expand upon our descriptions in the final version.
The dataset
contains user features such as age (numeric), gender (binary),
occupation, etc. which we map into a 22 dimensional feature vector per
user. We then construct a 10nearest neighbor graph using the euclidean
distance metric. We do the same for the movies, except in this case we
have an 18 dimensional feature vector per movie.
2. Regarding
Sections 5 and 5.1 being difficult to follow, and terms not being
defined:
We define the quantities \alpha and \beta in (14) and we
mention in Section 5.1 that \alpha is the spikiness. As suggested by the
reviewer, we will expand upon our descriptions, and add further
explanations, in the final version.
3. Why is there no RMSE
table for the 3 large datasets? from Fig.2 it's unclear if RMSE of GRALS
is equal/poorer than the other methods (it's clear that GRALS scales
better)
Figure 1 was devoted to an RMSE comparison, on the
movielens dataset, among different statistical estimators, while Figure 2
was devoted to comparing the scalability of just those algorithms that
take the graph into account, on some other larger datasets.
We will
gladly add RMSE comparison over the datasets from Fig. 2 as well, to the
appendix in the final version (for reasons of space). We also report the
results below:
DATASET PMF GRALS SoREC SoREG RTSE
Epinions 1.173 1.162 1.144 1.232 1.278
Flixster 0.923 0.845 1.018 1.034 0.975
PMF :
the method of Salakhutdinov and Minh, and SoREC, SoREG and RTSE are
variants of methods proposed by Ma et al, 2009. We will add results for
Douban as well in the final version.
Reviewer_4,
Reviewer_6:
Thank you for your kind
comments.

