NeurIPS 2020

Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

Meta Review

The paper provides a very general minimax framework for quantifying the bias/approximation error in off-policy evaluation, and the results apply to a range of OPE methods. Reviewers generally agree that this is a good paper and there is contribution. One potentially improvable direction would be to quantify the statistical noise in off-policy evaluation, which is nontrivial but extremely important. Reviewers, AC and SAC also agree that such analysis could be left for future work. We would also like to strongly suggest that the authors consider rephrase/explain the wording "confidence interval". In statistics, CI is mainly used to quantifying statistical error rather than approximation error. The current paper uses the word "CI" as as form of approximation error, but does not given statistical error analysis. Such use of "CI" could lead to potential misunderstanding, and should be clarified in the abstract and intro.