Paper ID:6282
Title:Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction

The reviewers were in consensus about the merits of this paper, in particular the value of the proposed approach and the theoretical analysis. Some concerns were raised about the experimental validation but these have been alleviated by the new results and baselines added during rebuttal. Some concerns remain regarding the clarity of the paper. The authors claim to have revised the text but we are not able to see it to validate that it has improved in this respect. The authors are strongly encouraged to put some real effort into improving the clarity of the final version. Overall a solid paper.