NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:4090
Title:Phase Transitions and Cyclic Phenomena in Bandits with Switching Constraints

Reviewer 1


		
Relation to the Batched bandit problem: the authors claim that the relationship between the switching cost constrained MAB problem and the batched bandit problem as described in lines 143-154 is somewhat surprising. While the two problems are indeed different, this relation is not really that surprising. In fact there is a well known relationship between mini-batched bandit algorithms and bandit algorithms with switching costs in the adversarial setting. In particular, showing a lower bound for the MAB with switching costs for a certain type of mini-batched algorithms goes through a lower bound for the adversarial MAB with switching costs (see for example Bandits with Switching Costs: T^{2/3} Regret). The main idea behind the algorithm is to carefully divide the T rounds of the MAB game into segments. Before each segment a set of active arms is pruned according to a UCB/LCB strategy and then the arms from the active set are played an equal amount of times in the segment. The crux of the proof is choosing the length of each segment according to the bound on number of switches and number of arms. While the batch sizes induced by each segment might be novel, the proof seems standard. The main technical contributions begin with the lower bound proof of Theorem 2. The proof seems non-trivial and the use of stopping times according to when each arm has been pulled for the first time to my knowledge is novel. Since the proof of the lower bound is an important contribution of this paper, the paper might benefit from a lengthier explanation about how the proof works exactly or at least some more intuition as the paragraph in between lines 194-204 is not really helpful when reading the Appendix. For example the authors might want to explain more carefully how the stopping times come into the proof, as a case by case analysis and how the auxiliary policy \beta is constructed to show the lower bound. While the discussion about the Phase transitions and Cyclic Phenomena is insightful, the space in the main paper might be better used to expand on the proof of Theorem 2. The main paper will also benefit from a lengthier discussion about Algorithm 3 which is deferred to the appendix and the proofs of Theorems 3 and 4 as the proofs of these theorems seem non-trivial and a good contribution. Can the authors also clarify, how the bounds in Theorem 3 will change if the shortest Hamiltonian path is only approximately computed in Algorithm 3, instead of exactly? Overall the contributions of the paper seem significant and especially the proof of the lower bounds seem novel. The relation between arbitrary switching costs and the shortest Hamiltonian path is also maybe surprising. The paper’s presentation can be improved if the authors spend less time discussing implications of Theorem 1 and Theorem 2 and focus more on the proof techniques and Algorithm 3, which are currently deferred to the appendix. ----------------------------------------------------------------------------------------- I have read the response by the authors and other reviews. I am mostly satisfied with the rebuttal and the proposed reorganization of the paper. While I agree with the comments of other reviewers and it would definitely be nice to have a more thorough treatment of the phase transitions, I find that the rebuttal addresses most of the raised concerns and will keep my current score.

Reviewer 2


		
Originality: The scope of the paper is higher original in both achieving and discussing a regret bound with multiple phase changes. Notably this is achieved in a setting well motivated in literature and with a model corresponding to real use cases. Clarity: The paper is well written. The language is easy to read and informative while still being sufficiently explicit for instance when defining algorithms. While what is written is excellent the space management of the paper is decidedly poor. In my opinion this is the paper's main weakness: While the abstract and introduction promises treatment of the case of general switching costs and connections to graph traversal, this significant part of the paper is given half a page (section 5) with large portions of not just technicalities relegated to appendices. From my point of view this has not received a proper treatment in the main text of the paper, and I will largely consider it outside of the scope of the present text! With this removed it is however not unrealistic that the remaining scope could be accepted! Similarly it is shame that little space is devoted to proof techniques in the main text. Significance: The results are important in two ways: The model of external switching constraints is a significant framework for dealing with real life models. Secondly achieving a regret with phase transitions in this fashion is interesting in its own right and the paper puts a great deal of effort into the discussion hereof, which could spark interesting discussions of characterisations of bandits in general. -- Further comments and questions: 1) It would be nice with a more thorough treatment of the necessity of the phase transitions in the "actual regret scaling" compared to the asymptotic bounds shown here. In other words answering the question: Could there be a smooth regret scaling (i.e. not displaying phase transitions) for which the proven bounds still holds? 2) While the even spacing of phases is interesting, the terminology of "cyclical" seems a bit contrived. It is technically correct but seems oversold as a separat topic in the title rather than a characterisation of the phase transitions. ==== After the author response and reading the other reviews, my opinion is that the paper has significant contributions and the authors are willing and able to significantly reorganise the paper in a way to address our concerns. While a thorough reorganisation plan is sketched out in the response, it is my opinion that this is too significant of a change to accept without further review.

Reviewer 3


		
This is a novel result on the MAB problem with limited switch cost problem. The authors show a intuitive method to reduce the switch cost in the unit swtich cost case, and then prove that their method is optimal. They also show the constraint on the number of switches inside an MAB game. The regret is reduced a lot only when the budget of the switches has been larger by k-1, which is the minimum cost to try every arm in a period. Then they generalize their result to the case that the switch costs are arbitrary. In this case, and propose the HS-SE policy. This policy is still optimal in a lot of kinds of swtich cost case, and near optimal in other cases. This is a good paper that shows the relationship between number of switches and the culmulative regret. It is useful in some kind of real applications. The proofs are correct, and the writting is clear as well.