Review for NeurIPS paper: Sampling-Decomposable Generative Adversarial Recommender

NeurIPS 2020

Sampling-Decomposable Generative Adversarial Recommender

Review 1

Summary and Contributions: This paper overcomes the training difficulty of existing GAN-based recommender models, IRGAN, and proposes a new recommender model, called sampling-decomposable generative adversarial recommender (SD-GAR), which overcomes the divergence of the generator by self-normalized importance sampling and decompose the sampling method from training the generator. Furthermore, it utilizes the ALS-based method for efficient training. Experimental results show that SD-GAR outperforms IRGAN and other baseline models.

Strengths: 1. This paper effectively analyzes the limitation of the existing GAN-based recommender model, IRGAN. 2. This paper is well-written and shows a thorough analysis and proof of the proposed model. 3. It shows an extensive experimental evaluation with various datasets.

Weaknesses: 1. It is necessary to show more thorough ablation studies such as the effect of the loss function, the effect of different sampling methods, and the effect of training methods. 2. It would be better to compare the state-of-the-art GAN-based recommender model. Please refer to the following reference. - Chae at el., "CFGAN: A Generic Collaborative Filtering Framework based on Generative Adversarial Networks," CIKM 2018 https://dl.acm.org/doi/10.1145/3269206.3271743 3. It would better to show a smaller size N of NDCG@N, e.g., 10 or 25.

Correctness: The proposed method is correct. However, it would be better to show a more thorough evaluation, including a state-of-the-art model, i.e., CFGAN.

Clarity: This paper is well-written, and the organization of this paper is clear.

Relation to Prior Work: It would welcome discussing the recent GAN-based recommender model. Also, it needs to include more recent state-of-the-art recommender models, such as graph convolution-based recommender models and autoencoder-based recommender models. Please refer to the following papers. - Xiangnan He et al., "LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation," SIGIR 2020 https://arxiv.org/abs/2002.02126 - Dawen Liang et al., "Variational Autoencoders for Collaborative Filtering," WWW 2018 https://arxiv.org/pdf/1802.05814.pdf

Reproducibility: Yes

Additional Feedback: After rebuttal, I found that the experimental results in CFGAN are too lower than other papers, including CFGAN and IRGAN. Please refer to the following papers. Chae at el., "CFGAN: A Generic Collaborative Filtering Framework based on Generative Adversarial Networks," CIKM 2018 https://dl.acm.org/doi/pdf/10.1145/3269206.3271743 Zhou et al,, "Recommendation via collaborative autoregressive flows," Neural Networks 2020 https://www.sciencedirect.com/science/article/pii/S0893608020300873 I would recommend that the authors check the evaluation setup and hyperparameter tuning.

Review 2

Summary and Contributions: The paper revisits an important work -- IRGAN, which uses GAN to sample training examples for IR problems, and identifies two problems (with theoretical analysis) of it: deviation due to the limited capacity, and high sampling complexity. Moreover, they proposed solutions (new objective and decomposable sampling) to nicely address these problems. The experiments are quite extensive with various strong baselines, the performance improvement on both ranking performance and time consumption is significant, and the setting looks convincing to me. Lastly, the paper is well written in all the aspects (intuition, theory, experiments), and easy to follow. I consider this is a solid improvement on IRGAN which makes it much better and more efficient.

Strengths: - Nicely identify, analyze, address the two problems in IRGAN with reasonable methods. - Extensive experiments and impressive performance - Well written

Weaknesses: - IRGAN is designed for various IR tasks while here only focus on recommendation. However, the results on recommendation are solid to me, so I guess it may work well on other tasks, though it'd be better to have the results as well.

Correctness: Yes

Clarity: Good

Relation to Prior Work: Good

Reproducibility: Yes

Additional Feedback: The rebuttal mostly addresses my concerns, the supplementary results make the paper more convincing to me. Hence I maintain my rating of accept.

Review 3

Summary and Contributions: This paper analyzed well-known GAN based information retrieval framework IRGAN in the recommendation setting. It proposed multiple interesting modifications that significantly improve its training efficiency and scalability for recommendation tasks. Specifically, the paper first pointed out two problems of IRGAN: (1) simple GAN objective could cause the optimal negative sampler biases to extreme cases (delta distributions), (2) Sampling from the optimal negative sampler is computationally expensive. For addressing (1), the paper proposed to add an entropy regularization that smooth the negative sampler distribution (optimal). For addressing (2), the paper suggested using self-normalized important sampling to approximate optimal negative sampler found in (1), where sampling from proposed distribution could be decomposed into two-step categorical sampling. Further, the paper described a strategy for learning proposed distribution by minimizing estimation variance through a constrained optimization. I have read author feedback and I believe my review conclusion no need to change.

Strengths: The idea of the paper is novel and exciting. The paper writing quality is good, but some parts need further clarification.

Weaknesses: The experiment part is suspicious in terms of baseline as well as metric choice. For baselines, there are many strong algorithms proposed and used in practice, such as classic WRMF (still the winner of recommendation challenges RecSys 2018) published in 2008, VAE-CFs, Neural Collaborative Filterings~(NCFs). Why only BPR, CML, etc., that has been repeatedly outperformed on the benchmark datasets? The authors probably need to justify the baseline choice a bit. For metric, NDCG@50 is over too much for the datasets used in this paper. There are rare users who have around 50 historical interaction records. It is hard to justify the performance of the proposed algorithm. Over 10% of performance improvement on benchmark datasets is especially hard to believe after years of many good algorithms published.

Correctness: yes

Clarity: Yes, the paper is in good shape. However, as stated, some sections need more clarification

Relation to Prior Work: yes

Reproducibility: Yes

Additional Feedback: Suggestions: 1. The paper is hard to follow due to many mathematical descriptions but omitting connections among contents. For example, after describing important sampling in proposition 2.1, the paper directly turns into explaining the variance of proposed distribution without mentioning why we want to look at it. It is better to use one or two-sentence to state that reducing the variance of estimator could help optimize the proposed distribution beforehand. 2. The connection between equation 7 and proposition 2.2 is also a bit off. It is better to state why maximizing equation 7 is equivalent to minimizing estimator variance directly. Proposition 2.2 does not show/indicate the optimization objective explicitly. 2. Equation 2, A distribution equals to a one-hot vector is odd, maybe Cat(K, one-hot(...)) 3. Conduct a comprehensive comparison to the SOTA models published in the last three years, if possible.

Review 4

Summary and Contributions: This paper points out the limitations of previous generative-retrieval recommender systems; 1) sampling items from the generator is time-consuming and 2) the discriminator performs poor in top-k item recommendation, which would be caused by the divergence between the generator and the optimum. To tackle these challenges, this paper proposes Sampling-Decomposable Generative Adversarial Recommender (SD-GAR) which compensates the divergence between the generator and the optimum as well as decomposes the generator to reduce the sampling cost of the generator.

Strengths: 1.This paper provides the theoretical analysis on the generative-retrieval recommender system. The authors theoretically show its optimal sampling distribution and the approximation to the optimum. 2. This paper proposes the closed-form solution for optimization of the generator as well as the new sampling-decomposable generator. 3. The proposed algorithm achieves remarkable improvements over the state-of-the art competitor in terms of both scalability and recommendation accuracy.

Weaknesses: 1. Some related papers are not investigated and not discussed in this paper. The authors are strongly recommended to include the survey on the related work. 2. Experiments are not solid: 1) unclear hyperparameter setups, 2) weak analysis on the divergence of G (or poor performance of D).

Correctness: In overall, the claims of this paper are technically sound. However, the empirical evaluation for the proposed method is not sufficient to validate the claims of the paper. The optimal hyperparameter values are not thoroughly searched (or not mentioned at all) for the competitors. For example, the L2-regularization coefficient in BPR and the margin size in CML largely affects their final performances, thus they should be specified. Only a single metric is used for evaluation (i.e., NDCG@50). More ranking performances should be reported, including MRR and MAP. In case that the size of the latent dimension is small, comparing Gan-like methods with traditional latent factor models is unfair because the number of parameters in Gan-like methods is much larger than that in latent factor models.

Clarity: The paper is well written in overall, but some parts should be further clear: First, the authors claimed that the discriminator D (rather than the generator G) should be considered as a recommender due to the data sparsity issue in the generator G. However, the authors also repeatedly mentioned that the discriminator D shows poor performances in top-k recommendation, which makes the readers confused. Second, some typos (e.g., index u in eq. 4) should be corrected.

Relation to Prior Work: Some related papers are not investigated and not discussed in this paper. [1] CFGAN: A Generic Collaborative Filtering Framework based on Generative Adversarial Networks. [2] Adversarial Binary Collaborative Filtering for Implicit Feedback.

Reproducibility: No

Additional Feedback: