Review for NeurIPS paper: Delay and Cooperation in Nonstochastic Linear Bandits

NeurIPS 2020

Delay and Cooperation in Nonstochastic Linear Bandits

Review 1

Summary and Contributions: The paper mainly considers the adversarial linear bandits with delay feedback. The algorithm adopt the distribution truncation technique, which gives the optimal regret bound up to logarithmic factors. Also the algorithm considers the cooperative bandits. Lower bounds corresponding to two settings are also provided.

Strengths: The algorithm is simple and strong, which is promising to apply to different setting.

Weaknesses: 1. It looks like prior work combining cooperative and delay model together, while this paper only considers them separately. 2. The computation issues. Although log-concave distribution can be sample in polynomial time, it still requires more computation time than other discrete linear bandit algorithms, not to mention the algorithm has to sample multiple times per round in order to truncate the original probability distribution. Also, it is not clear to me how to compute S(p')^-1 efficiently.

Correctness: 1. It is not clear to \hat{\ell} is unbiased. The point is that S(p_t')^-1 may not exist. I think a fix is that we have some assumption on the feasible region, but it definitely affects the generality of the algorithm. 2. Moreover, the computation of S(p_t')^-1 can also be problematic. I am not sure it can be calculated efficiently. It is not even a log-concave distribution. 3. In Lemma 4 and equation (27), it is not trivial the inequality can be applied. There should be some condition on eta and gamma. But I guess it is okay as the algorithm truncate the distribution, but I think the authors may need to explain more.

Clarity: Yes

Relation to Prior Work: Yes

Reproducibility: Yes

Additional Feedback: After reading the authors' responses, I think the authors adequately answer my questions. Now I have no concerns about the correctness and the significance of the contribution. I am voting for accepting this paper. However, I hope the authors can add more details on approximation of S(p'_t)^-1 and the time complexity in the final version.

Review 2

Summary and Contributions: The paper considers linear bandits in dimension m with delays d. Trivially one can obtain a bound sqrt(m*m*d*T), where one m comes from measuring the volume of the action set, and the term m*d comes from the variance calculation. The present papers proves that in fact the variance can be m+d, matching a COLT 2016 paper by Cesa-Bianchi et al. for the basic multi-armed bandit case (that is, they had proved K+d for the variance).

Strengths: I find the truncation trick to be very interesting. To my knowledge this is the first time this is applied to the linear bandit setting, and it seems much more natural than the spanners type of approaches. It would be nice to discuss whether it relates to the focus region of [Bubeck, Eldan, Lee, STOC 2017].

Weaknesses: The contribution can be viewed as narrow, but it is not a significant weakness since the problem is also fundamental.

Correctness: I have read some of the proofs and they are correct.

Clarity: The paper is well written.

Relation to Prior Work: In addition to the reference BEL17 that should be discussed, I think in table 1 the no-delay linear bandit bound should be attributed to [15].

Reproducibility: Yes

Additional Feedback:

Review 3

Summary and Contributions: This paper studies the non-stochastic linear bandit problem with a fixed delay horizon d. They provide up to log factors matching upper and lower bounds. They also extend the results on cooperative bandits to the linear setting. Rebuttal update: No change in evaluation.

Strengths: This is the first result for non-stochastic linear bandits with delays. It is highly non-trivial and requires a novel technique to adapt the mwu algorithm.

Weaknesses: The paper only considers the ``easy'' setting of fixed and known delay d for all time steps. In the multi-armed bandit literature, the more advanced problem of an unknown cumulative delay budget D has been studied. It would be nice if the authors mention this line of work and explain the difficulties arising in the linear setting.

Correctness: I did not check the proofs of the Lemmas, but the main body seems fine.

Clarity: Yes.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: