Paper ID: | 2395 |
---|---|
Title: | Multistage Campaigning in Social Networks |
The authors consider a model of electoral campaigning in social networks. They formulate and solve a multi-stage optimal stochastic control problem which aims at maximizing either the minimum or total exposure to the campaign.
The problem solved is interesting and there is some interesting modelling and math involved as well. It's not entirely clear how applicable these results are in a real situation but this is acknowledged in the paper. The presentation and the language is also pretty good, although a careful read would not be a waste of time. In theorem 3 I am not sure what is meant by piecewise absolutely continuous.
2-Confident (read it all; understood it all reasonably well)
This paper uses Hawkes process to model user activities in social network campaigns. It further considers the problem of implementing interventions in multiple stages to shape exposure. A convex dynamic programming framework is developed to determine the optimal intervention policy to achieve desired campaigning results.
The design of the experiments, especially the evaluation on the real data, may not be sufficiently relevant. According to the output of Algorithm 1, the goal is supposed to find out the set of optimal interventions for u_0, ..., u_m. It is understandable that carrying out real intervention in a social media platform would be very challenging. Nevertheless, it is not clear why predicting which cascade will reach the objective function is relevant to assess the effectiveness of the proposed algorithm. Moreover, the experimental results are not discussed sufficiently. For example, the three subfigures on the left column of Figure 2 show fluctuations of prediction accuracy. However, the results are simply summarized as "the performance drops slightly with the network size". More discussion is expected to understand the reasons behind the performance fluctuations, for example, does it relate to instability of the algorithm?
1-Less confident (might not have understood significant parts)
Hawkes process has been proven to be useful in financial applications. Limit theoretic results for them have been obtained in probability theory in the past few years. In particular, this process is able to capture jumps in a way that wasn't possible in systems such as Brownian motion. The paper models the process of activity of a social network user by a multivariate Hawkes process, with links to typical cost functions. Experiments on synthetic and real networks are provided.
The paper provides strong connections in novel areas. However, the only experiments have been carried out on very small scale problems compared to one would ideally desire. This brings concerns about scalability which can be an issue that has to be resolved. It would be great to see both more theoretical justification for scaling, and practical observations, if possible.
1-Less confident (might not have understood significant parts)
This paper introduced a very interesting topic, i.e. multistage campaigning in social network by using social network, optimization, and closed-loop dynamic programming. This is a difficult problem because multiple factors have to be considering. For example, the user relation network, the temporal information, the uncertaining of the user behavior, etc. This paper utilized the temporal point process to model the problem, by using techniques like optimation and dynamic programming. It also provide experiments on both synthetic and real-world datasets to demonstrate the application of the proposed model.
This paper is really interesting and with great contribution if it could be applied to real world social networks like Twitter or Facebook. The contribution is clearly demonstrated and the idea is interesting and innovative. However, the experiment is not sufficient enough. First, the baselines that are compared to should be described briefly in stead of only giving the abbrivation of the baseline names. Secondly, does the vertical line on top of the bar represent standard deviation or 95% confidence interval in Figure 1? If two algorithms have their confidence interval overlap, it means that they are not significantly different from each other. According to the graph, proposed algorithm has its confidence interval overlap with other algorithms. In terms of this issue, more explanation are expected. Thirdly, if the vertical line represents standard error, CLL has larger standard error than RND and OPL according to Figure 1 (a). It would be better to explain why the standard error of proposed algorithm is not as good as other algorithms. In addition, although a section of Conclusion is not necessary, it would be easier for future readers to have a brief understanding of what is achieved in this paper.
1-Less confident (might not have understood significant parts)
The paper builds on existing work in shaping social activity on social networks modeled as multivariate Hawkes processes (developed in citation [8] for constant-in-time control schemes). The paper characterizes a relationship between time-dependent exogenous activity and average activity rate in the network for exponential kernels, and uses this to derive an approximate dynamic programming control algorithm for driving activity using piecewise-constant control signals. The paper compares the performance of their control algorithm to baselines on synthetic networks with simulations, as well as real-data (using a custom performance metric) and shows promising although somewhat difficult to interpret results.
The theoretical contributions (relationship between time-dependent exogenous intensity to average activity) appear significant, and the use of this result to derive a closed-form control algorithm appears to be a contribution to the field of shaping activities in social (and other) networks modeled as multivariate Hawkes processes. Regarding the clarity of this paper: The paper does not frame itself relative to prior work clearly. Clearly indicating where and how this work is a generalization of prior work [8] would improve clarity of the paper, and highlight the core contribution of handling *time-dependent* exogenous events. Stating that the paper "establishes theoretical foundations of optimal campaigning over social networks where user activities are modeled as multivariate Hawkes processes" does not properly localize the work relative to [8]. This could be improved in part by mentioning [8] in the introduction, and specifically stating that prior work has done optimal control with constant exogenous control, and that this paper addresses multi-stage exogenous intensity. Similarly, in section 3 “From Intensity to Average Activity”, it should be made clear that this is an extension of a previous analogous result for constant-time activity. Finally, the three objective functions of section 4 appear to mirror the three objective functions from [8], but the text does not reveal this when it states “In what follows, we introduce several instances of the objective function...”. These changes may help the reader focus on the core and cohesive technical contribution. The description of the closed-loop approximate dynamic programming control algorithm requires more precise language. For example, Lines 241-252 are unclear: The excerpt “By observing the counting process in previous stages summarized in a sequence of x_m) and taking future uncertainty into account, the control problem is…” needs to be more explicit (using mathematical variables) about what specific data is given to the control algorithm, and what are the specific random variables causing the "future uncertainty". For example, in Line 257: “However, when control policy \pi_m is to be implemented, only x_m is observed and there are still uncertainties in future..” Why is this a problem? Assuming “implemented” means that the policy was already determined, then isn’t implementing the policy a matter of evaluating the function \pi_m on input x_m?. Being more clear about the *computation* represented by Equation 19 and the *computation* representing Equation 18 would make it more clear how Algorithm 1 is derived in a stage-wise process (e.g. on Line 268: “Solving for .. is analytically intractable [due to ...]"). How significant is the choice of exponential kernel? It seems it may be critical to the control algorithm. If this is indeed a critical requirement, it deserves some more discussion besides just calling it “standard”. Based on the text, in Figure 1 the error bars seem to represent standard deviation and not a form of confidence intervals, but this could be made more explicit in the figure caption. Only the acronyms of the baselines used in Figure 1 and Figure 2 were given in the main text. The identity of these baselines could be briefly stated in the caption of Figure 1 (with citations). Regarding the potential impact of the work: The experimental results on synthetic data did not seem as significant of an improvement over open-loop one-time optimization over all stages as might be expected given that the closed-loop algorithm has access to significantly greater information. A demonstration for network parameter settings that better highlight the novel algorithm's strengths relative to baselines, paired with a demonstration of network parameter settings for which the distinction is less striking, would make the potential impact of the technique more clear. A discussion of in which contexts the proposed algorithm is likely to significantly improve performance would also strengthen the clarity as well as make it easier to assess impact. The experimental results on real-world data are based on a performance metric of “predicting which cascade will reach the objective function better. They should be able to answer this by measuring how similar their prescription is to the real exogenous intensity...” This metric needs a clearer and more mathematically precise definition---the results of Figure 2 seem promising, but are hard to interpret based on the lack of clarity in the prose description of this key metric.
1-Less confident (might not have understood significant parts)
This paper studies the problem of multi-stage campaigning optimization in social network. The activities of users are modelled as a multivariate Hawkes process (MHP). The intensity of exogenous events and commonly used objective functions follow a time-dependent linear relation. A convex dynamic programming framework is used to compute the optimal intervention policy which prescribes the required level of external drive to obtain a desired campaigning exposure profile.
The peer influence and external exposure in social network is an interesting topic. The MHP model adopted in the paper has been applied to this problem by varied social network papers. The contribution of this paper is primarily the dynamic programming framework to the multi-stage optimization problem. The derivation seems to be correct and the experiments are reasonable. The major issue is that the paper is not self-contained. The related work and conclusion section have been moved to the supplement material. There are no discussions on a few citations in the paper since it only appeared in the related work. I do think related work is necessary for the main body since the problem studied in this paper is a popular topic in social network research. Other comments include: 1. Approximate dynamic programming has been applied to solve (18). There is no theoretic guarantee on the performance of this approximation algorithm. Approximation ratio should be discussed. Otherwise, it is hard to evaluate the contribution of this algorithm. 2. Scalability of the algorithm is very important in social network analysis due to the scale of the network. The computation complexity of the proposed algorithm is unclear and should be discussed.
2-Confident (read it all; understood it all reasonably well)