NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper proposes a natural extension of UCRL2 to learning state representations. The proposed algorithm chooses optimistically over a finite set of candidate MDPs and their corresponding policies. The algorithm is analyzed and improves over existing regret bounds. The paper was discussed and all reviewers agree that this is a natural extension of UCRL2 that deserves to be published.