__ Summary and Contributions__: This paper argues to debias via an optimization framework that optimizes towards the worst case risk, which is a new idea in recommendation debiasing. The theoretical analysis also sounds interesting and is insightful.

__ Strengths__: 1. Debiasing towards worst case exposure strategy is new
2. Theoretical analysis is interesting

__ Weaknesses__: The novelty of the method seems to be limited, the author should compare to other similar works.
1. The worst case optimization framework is similar to DRO [1*]. The difference is that the author tries to optimize towards the worst case exposure strategy rather than group performance in DRO.
2. [2*] also proposes a dual learning algorithm to learn simultaneously the unbiased exposure distribution and the user preference. Some more experiments may be conducted against this type of work.
In the experiments part, the paper's method shows minor improvement over POP as propensity weighting function. And I would like to see some explanations why in Table 1 g cannot be oracle in ACL, and why MLP/oracle is worse than MLP/Pop.
[1*] Fairness Without Demographics in Repeated Loss Minimization, https://arxiv.org/abs/1806.08010
[2*] Unbiased Learning to Rank with Unbiased Propensity Estimation, sigir 2018

__ Correctness__: Yes

__ Clarity__: The notations are overwhelming, making it hard to concentrate on the backbone message that the authors try to deliver to this community.

__ Relation to Prior Work__: Not really. Relationships with DRO and dual learning for debias are not discussed.

__ Reproducibility__: Yes

__ Additional Feedback__: For questions, see the weakness part. Overall, I found this paper interesting and insightful to read. I think it is debatable to say the exposure strategy is unknown when we take control of the exposure strategy in the system, i.e., though there are unobserved factors, these factors may not have impact on the exposure strategy that does not consider them.

__ Summary and Contributions__: In this paper, the authors study unbiased recommendation problem and propose an adversarial strategy to learn both recommendation and exposure models.
Both theoretical and experimental analyses have been conducted to validate the effectiveness of the proposed method.

__ Strengths__: 1. This paper studies an important problem.
2. The idea of adversary between the recommendation model and the exposure model is novel. How to learn IPS scores is an important problem but there are limited solutions. The proposed method is robust and has the potential to learn better IPS scores.
3. The authors provide rigorous theoretical analyses to validate the superiority of the proposed adversarial strategy.

__ Weaknesses__: 1. Unclear Motivation. The authors give a statement that “the recommendation model is optimized over the worst-case exposure mechanism” but fail to give clear motivation behind the model. Why optimizing with the worst-case exposure is better than optimizing with the expected exposure that is widely adopted by existing methods?
It seems that the essential advantage of the proposed method is robust. Uncertainty is not a good motivation as it has been considered by existing methods and can not answer the above question.
2. Insufficient experiments. The proposed method should be compared with existing unbiased recommendation methods (e.g. [a1][a2][32]) to validate the effectiveness of adversary.
3. Insufficient related works:
3.1 For the technical part, the formulation of the objective (eq.(4)) is a Wasserstein Robust Stochastic Optimization (DRSO) problem [a5][a6]. The difference in terms of solutions and generalization bounds between the paper with [a5][a6] should be discussed.
3.2 Some important related works on unbiased recommendation [a1][a2] or exposure-based recommendation are missing [a3][a4]. Also, the authors should better summarize the weakness of existing unbiased recommendation methods, as uncertainty has been considered by these methods.
[a1] SIGIR'18: Unbiased Learning to Rank with Unbiased Propensity Estimation
[a2] WSDM’20: Unbiased Recommender Learning from Missing-Not-At-Random Implicit Feedback
[a3] CIKM’18: Modeling Users' Exposure with Social Knowledge Influence and Consumption Influence for Recommendation
[a4] WWW’19: Samwalker: Social recommendation with informative sampling strategy
[a5] Distributional Robustness and Regularization in Statistical Learning
[a6] Regularization via Mass Transportation

__ Correctness__: the claims and method are correct.

__ Clarity__: Yes

__ Relation to Prior Work__: Unclear. The details can refer to the weakness 3.

__ Reproducibility__: Yes

__ Additional Feedback__: I have read the author's response and increased my score. It's fine by me to accept the paper.

__ Summary and Contributions__: This paper investigates an important issue in recommender system regarding to the exposure bias. Despite that previous work address this problem through propensity-weighting approaches, this paper presents an interesting angle from adversarial learning to tackle the identifiability issue caused by implicit user feedback. The author derives learning bounds of the purposed minimax optimization problem and a robust offline evaluation metric through the introduced adversarial model. They evaluated their methods on simulated and real-world datasets and performed online experiments.

__ Strengths__: * Interesting idea of leveraging adversarial training to tackle the exposure bias problem.
* Theoretical grounds in the minimax optimization and counterfactual learning, derives an relaxation of the problem into a two-player game and corresponding learning bounds
* This problem is relevant to the NeurIPS community and especially to the researchers of CI/RecSys

__ Weaknesses__: * Empirical evaluation of this paper is relatively weak and more discussions of the results are needed to support the claims the authors made. For example, most of the improvements are marginal on Goodreads and LastFM (the number scale in Table 1 is not consistent). Please include statistical tests for these results. Also, why ACL-MLP with a MLP exposure model outperform MLP with an Oracle exposure model, what does this indicate?
* PS is a rather basic baseline for comparison. I would be curious to see how ExpoMF [1] and follow up works such as [2] compare to the purposed adversarial learning method
* The organization of the paper could be improved. Some of the theoretical results could be referred to in the appendix and some results should be further discussed (e.g. Theorem 1) while more explanations on the empirical results could be discussed in the main paper.
[1] Liang, D., Charlin, L., McInerney, J., & Blei, D. M. (2016 ). Modeling user exposure in recommendation. In Proceedings of the 25th international conference on World Wide Web
[2] Wang, M., Gong, M., Zheng, X., & Zhang, K. (2018). Modeling dynamic missingness of implicit feedback for recommendation. In Advances in neural information processing systems

__ Correctness__: yes

__ Clarity__: yes

__ Relation to Prior Work__: Please discuss how the local minimax ERM problem (referred to at line 144 and Equation 4) connects to/differs from the Wasserstein based Distributionally Robust Optimization [3,4]?
[3] Duchi, J., & Namkoong, H. (2018). Learning models with uniform performance via distributionally robust optimization. arXiv preprint arXiv:1810.08750.
[4] Ruidi Chen and Ioannis Ch Paschalidis. A robust learning approach for regression models based on distributionally robust optimization. The Journal of Machine Learning Research, 19(1):517–564, 2018.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: This work claims that existing methods that rely on counterfactual modeling, make problem-specific or unjustifiable assumptions to bypass the identifiability issue. In contrast, this work utilizes the uncertainty brought by the identifiability issue and treat it as an adversarial component. Specifically, they propose a minimax objective function and optimize it over the worst-case exposure mechanism. By applying duality arguments and relaxations, they show that the minimax problem can be converted to an adversarial game between two recommendation models.

__ Strengths__: 1. The approach is novel and the problem is interesting. Although there are some work use counterfactual learning approach from causal inference to address recommendation problem, they ignore the identifiability issues caused by partial-observation nature of user feedback data. Recent work on introducing adversarial modeling to solve the identifiability issue in observation studies focus mostly on learning balanced representation rather than the propensity-weighting method.
2. The authors propose a minimax objective function for counterfactual recommendation and convert it to a
tractable two-model adversarial game. Furthermore, they prove the generalization bounds for the proposed adversarial learning and analyze the minimax optimization properties.

__ Weaknesses__: 1. I think the key contribution of this work is the unjustifiable assumptions about identifiability for existing work. Unfortunately, the authors do not provide a theoretical analysis of the difference between the existing counterfactual learning on recommendation and the proposed method. The analysis of supervised learning on recommendation is not convincing.
2. The experimental analysis is nos sufficient; more concrete experiments are needed. It is very important to prove its effectiveness with SOTA methods (recommendation with casual inference), as well as show results in other standard datasets, which can show the effectiveness of the proposed with respect to different dataset complexities. More detailed analysis on the effect of the identifiability issue would be desirable.
3. No broader impact section.

__ Correctness__: The paper seems to be correct.

__ Clarity__: The paper is well written.

__ Relation to Prior Work__: The related work is adequately discussed.

__ Reproducibility__: Yes

__ Additional Feedback__: