Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Because the initial reviews were mixed, I obtained an additional review from an expert in the area of this paper. This 4th review came back clearly positive, but in the mean time one of the positive reviewers changed to negative (and later one of the negatives turned to positive). Then we had a lot of discussion, but the reviewers never did agree on how best to view this paper. In fact, they seemed to talk past each other, and in the end we had two positive and two negative reviews. As the area chair, reading the reviews and listening to the discussion, I found the 4th, very-positive review to be the most compelling. This review contended that the authors make "solid and important contributions" to the theory of reinforcement learning, in particular, to the finite-time analysis of Sarsa. The review notes that the “main difficulty and complexity” in this topic is dealing with the time-varying nature of the Markov chains. The solid and important contributions of this paper are to apply and significantly extend the prior basic theoretical work on time-varying Markov chains by Mitrophanov. The negative reviewers did not contend this view.