NeurIPS 2020

Value-driven Hindsight Modelling

Meta Review

Learning value functions is a central theme in reinforcement learning. It is a hard problem because of the non-stationary nature of bootstrapping. This paper proposes a fresh approach for improving the learning of value functions by conditioning them on some information of the future states at training time (hindsight). Conditioning on the right future data should provide more certainty about the future return. All the reviewers liked the premise of the paper, clear motivation, and thorough experiments. Reviewer's raised some good technical questions about reliance on trajectory even for off-policy methods, etc. The authors' provided a thoughtful rebuttal addressing these concerns. The paper was discussed in the post-rebuttal phase and everyone agreed that the paper provides interesting insights to be shared with the community. Please refer to reviewers' final comments and incorporate their responses in the camera-ready version.