Reviews: Regret Bounds for Learning State Representations in Reinforcement Learning

This paper proposes a natural extension of UCRL2 to learning state representations. The proposed algorithm chooses optimistically over a finite set of candidate MDPs and their corresponding policies. The algorithm is analyzed and improves over existing regret bounds. The paper was discussed and all reviewers agree that this is a natural extension of UCRL2 that deserves to be published.

Paper ID:	6926
Title:	Regret Bounds for Learning State Representations in Reinforcement Learning