Part of Advances in Neural Information Processing Systems 24 (NIPS 2011)

*Elad Hazan, Satyen Kale*

We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the log-loss defined in \cite{AbernethyR09}, which is parameterized by a scalar (\alpha). We prove that the regret of \newtron is (O(\log T)) when (\alpha) is a constant that does not vary with horizon (T), and at most (O(T^{2/3})) if (\alpha) is allowed to increase to infinity with (T). For (\alpha) = (O(\log T)), the regret is bounded by (O(\sqrt{T})), thus solving the open problem of \cite{KST08, AbernethyR09}. Our algorithm is based on a novel application of the online Newton method \cite{HAK07}. We test our algorithm and show it to perform well in experiments, even when (\alpha) is a small constant.

Do not remove: This comment is monitored to verify that the site is working properly