NeurIPS 2020

Optimal Best-arm Identification in Linear Bandits


Meta Review

This paper proposes a "Track-and-Stop" algorithm for linear bandits. Even if the idea is natural, the analysis of such an algorithm requires a few technical challenges, that are well highlighted in the paper and in the rebuttal. Moreover, this paper is the first to suggest and analyze "lazy updates" for Track-and-Stop, which is computationally interesting. We still recommend some clarification in the revision. The rebutall claims that "the implementation and performance guarantees [of the algorithm] are completely independent of K". It is unclear whether this is true for the *implementation* (not the main focus of the paper, though) as computing the oracle w(t) when it needs to be updated probably depends on K. Still regarding the oracle computation, it would be good to explain how it is done precisely. "Frank-Wolfe" is briefly mentioned in the paper, however there is no convergence guarantees for such an algorithm and [Degenne et al. 20] even provide an example in which it does not converge. Some further discussion on this issue would be appreciated. One minor comment to finish: using the acronym "TS" for Track-and-Stop may be misleading, as it often refers to "Thompson Sampling" in the bandit literature.