NeurIPS 2020

Automatic Curriculum Learning through Value Disagreement

Meta Review

This paper tackles the problem of adaptive goal sampling to automate curriculum learning by using value disagreement of an ensemble of models as a proxy. The method is clearly motivated and explained, with a wide set of experiments showing the mechanics of the method working in the intended way, and improvement over baselines in some continuous control tasks. However, there are a number of crucial pieces of prior work suggested by reviewers, that I would expect the authors to reference and discuss relation to in the final draft, in particular Bootstrapped DQN Osband et al 2016 which uses disagreement in value space to aid learning in RL.