NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper contributes useful structural results in regret minimization in the Markov Decision Process setting of RL, specifically for the class of tabular (i.e., unstructured) finite-horizon episodic MDPs. The paper is likely to stimulate the finite-sample analysis of online learning in MDPs via its new theoretical techniques.