NeurIPS 2020

Self-Imitation Learning via Generalized Lower Bound Q-learning

Meta Review

The author response provided satisfactory answers to the concerns of the reviewers with respect to contraction/bias tradeoff, disconnect between the experimental results and theory, and variance of the estimator. This lead one reviewer to increase their score for this paper, which already had reasonably solid scores.