NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper studies the adversarial cooperative bandits problem, previously studied by Cesa-Bianchi et al. In this problem, N users pull at each time step the arm of their choice and receive an aversarial reward X_k(t) (common to all player pulling k). They can then communicate at each time step some information to their neighbors in a given communication graph. Cesa-Bianchi et al. provided a sublinear collective bound for this problem. This paper proposes an algorithm reaching a sublinear individual regret, answering to an open question. The reviewers liked the paper, and thought the model is interesting and that the contribution is solid.