NeurIPS 2020

Temporal Variability in Implicit Online Learning


Meta Review

This paper considers the implicit update algorithm for online learning (a.k.a. proximal updates in the optimization literature). It is shown that the algorithm achieves a regret bound that is adapted to the variability of the sequence of loss functions. This holds even without the smoothness of the loss. I believe this is a firm contribution to the fields of online learning and stochastic optimization. Firstly, Implicit updates are known to have practical advantages, but their theoretical understanding has been limited to the fact that they enjoy the same worst-case regret guarantees as their explicit counterparts. This is one of a very few works (if not the first one) which shows a nontrivial advantages of the implicit methods and thus makes a significant progress in better understanding of their behavior. Secondly, the previous approaches to dealing with limited temporal variability of the loss, such as optimistic MD, require the loss to be linear or smooth. In contrast, the authors show that their method achieves O(1) regret when the temporal variability is constant without smoothness assumption. Finally, the authors provide an adaptive version of the algorithm (AdaImplicit) which does not need any knowledge on the variability V_T, and simultaneously guarantees a regret bound adapted to the variability and the worst-case bound of AdaFTRL. The was a discrepancy in the scores assigned by the reviewers. The paper was found to be well-written and interesting by two of the reviewers who also appreciated the novel results. One of the reviewers found the paper to be a rather incremental update of the prior work on FIOL, and the usage of temporal variability to be inappropriate in the static regret framework. In my own opinion, these doubts were properly addressed in the rebuttal.