NeurIPS 2020

### Review 1

Summary and Contributions: This paper studies the problem of online influence maximization with the LT model, rarely used in the literature compared to the IC model. An important assumption of the paper is the type of feedback: it is assumed that each step of the diffusion is observed by the learning agent. Two algorithms are proposed: one based on the well known LinUCB, and the other is a variant of the ETC algorithm. Bounds on the regret are proven for these two algorithms, as well as an experimental study (in appendix).

Strengths: This paper has the merit of tackling the setting of the LT model, much more difficult to analyze than the IC model, and arguably better to use in some context. Proven regret bounds make sense in a setting with such limited feedback. It seems that the significance/relevance of the work behind this paper is not negligible (because the technical challenges are there)

Weaknesses: My main concern for this paper is that the idea of using LinUCB in the LT model is not new (despite everything, the idea of an algorithm is not enough to prove its theoretical efficiency). I also have a small problem with the use of "node level" feedback terminology. For me, node-level feedback is simple to describe: the agent observes all the nodes influenced in the round t, period. Here we have a little more: each step of the diffusion is observed, which gives slightly more information. I'm not against this kind of setting, but I find it hard to see the added value between the whole observation of the diffusion and the simple "node-level" feedback. In particular, it seems that the setting where only the influenced nodes are observed can, in principle, benefit from a linUCB type algorithm: for each node, we can observe the Bernoulli variable of parameter the sum of the weights of the influenced incoming nodes. I think that the main issue with this kind of feedback (the "true" node-level one) is that all three terms in Theorem 1 are not covered (which is probably why the way to use your richer feedback in the algorithm is randomly chosen with probability 1/2). I may be mistaken, and I'd like the authors to give me more explanation on what I mentioned above. A dedicated paragraph in the paper would, I think, also be very helpful.

Correctness: From what I've checked, the analysis seems solid.

Clarity: This paper is otherwise pretty clear and easy to follow.

Relation to Prior Work: The various previous works seem to me quite well covered, and this paper stands out well in the contributions.

Reproducibility: Yes

Additional Feedback: My main suggestion is to provide more explanation as to the relevance and necessity of the feedback used. —— After rebuttal : I acknowledge I have read and taken into account the rebuttal

### Review 2

Summary and Contributions: This paper studies online influence maximization (OIM) under the linear threshold (LT) model. OIM has hitherto been studied in the independent cascade model, and this is the first work that studies it in the LT model. The paper proposes two algorithms, a UCB-style and an explore-then-commit (ETC) style algorithm, and analyzes their regret. == after rebuttal == Thank you for your response.

Strengths: The paper is relevant to the community and explores an interesting direction for influence maximization. The theoretical claims are sound, and the contribution is novel.

Weaknesses: The paper lacks experiments. While I appreciate the theoretical contribution of the paper, it would be good to compare the empirical performance of the two proposed algorithms, and its comparison to an algorithm that chooses seeds randomly. I also found the paper a bit hard to read. It took me multiple readings to understand that E_{t,1} and E_{t,2} corresponds to the times just before the activation of the node. I feel the paper would also benefit from using terminology to denote the two notions of time (t and tau in line 156). When the authors say node level feedback, one may think that the learner observes the nodes that are activated at each time. However the feedback here is more granular.

Correctness: Yes.

Clarity: No. Please look at the weaknesses pointed out.

Relation to Prior Work: Yes.

Reproducibility: No

### Review 3

Summary and Contributions: This paper considered the online influence maximization problem based on a combinatorial MAB approach. Instead of a commonly used independent cascade model, the authors choose to use a linear threshold model, where the threshold $\theta_v$ for any node $v$ is drawn independently and uniformly in $[0,1]$. Based on this model, they propose two algorithms, an UCB based algorithm LT-LinUCB and an ETC(Explore-Then-Commit)-based algorithm OIM-ETC, and their corresponding regret analysis. They also use some experiments to demonstrate their theoretical results.

Strengths: This is an novel topic in bandit problems. The proofs of their theorems are correct and their experiments can demonstrate their theoretical results.

Weaknesses: For this paper, my major concern is on the model. In my opinion, either we follow a FREQUENCY setting to suppose that both the edge weight $e_{u,v}$ and the threshold $\theta_v$ are fixed but unknown constant in $[0,1]$, and the goal of our algorithm is to learn them out to achieve good performance; or we follow a BAYESIAN setting to suppose that both the $e_{u,v}$ and $\theta_v$ follow some known prior distribution, and we are going to minimizing the Bayesian regret of this model by some online learning policies. In this paper, the edge weight $e_{u,v}$ are fixed and unknown, while the threshold $\theta_v$ follows a uniform distribution in $[0,1]$ as its prior distribution. This is really strange. Unfortunately, I found that the algorithms in this paper highly rely on this half-frequency-half-Bayesian setting. Because of this, I think the authors need to explain and discuss more about the motivation of using a half-frequency-half-Bayesian setting, some related works about this setting and how this scenario may appear in reality.

Correctness: I check most of the proofs in the supplementary file, and I think they are correct.

Clarity: The paper is well written.

Relation to Prior Work: It is clearly discussed.

Reproducibility: Yes

Additional Feedback: ============After reading other reviews and the rebuttal================ I understand the applicability of this model by reading the rebuttal and other reviews. Therefore my score changes to "6".