Paper ID: | 6251 |
---|---|

Title: | Optimizing Generalized PageRank Methods for Seed-Expansion Community Detection |

[Originality] Overall, this paper is well written. The first contribution regarding the convergence property of GPR is an improvement over the work of Avrachenkov et al. [23], where the authors drop or weaken some of the conditions. In the second contribution, the authors consider seed-expansion community detection using GPRs, as an improvement over Kloumann et al. [20] who consider the mean-field LPs and find the optimal value of \alpha for PPR. The authors also try to incorporate the variance as well as the mean, resulting in the use of pseudo Fisher's linear discriminant, and the best weighting under SBM is explored. Empirically observing the asymptotic behavior of the variance, the authors propose to use a weighting that increases as k grows, unlike PPR and HPR. [Quality & Clarity] * Although theoretical results including Lemma 3.2, Theorems 3.3 and 3.4 are quite condensed, the implications made from them seem to help the reader understand the contents. * Characterization of the variances is reasonably described; theoretical upper bounds do not significantly differ from the empirical observations. * Some experimental results on algorithm comparison are not much convincing. On Citeseer, Cora, and PubMed, the authors compare IPR (\theta=0.99) to PPR (\alpha=0.95) and HPR (h=10); here the result for PPR (\alpha=0.99) is missing. Clearly, the examined PPR's weight decays much faster than the examined IPR's weight, e.g., 0.95^50=0.08 and 0.99=0.6, and thus, this PPR would not provide a significant change for large k (>50). Also, if my understanding is correct, PPR and IPR demonstrate almost identical performance when \theta and \alpha are nearly equal to 1, and thus, I suspect that IPR (\alpha=0.99) is in fact close to PPR (\theta=0.99). [Minor issues] * Line 136 is confusing: Does the inequality mean that "$\bar{d}_v = \omega(1)$ and $\bar{d}_v \leq (1-\epsilon)n$"? * The small figure embedded in Figure 2 is too small to read. * The markers in Figure 3 are too small. ==== UPDATE ==== Thank you for providing the feedback. The experimental comparison between IPR (0.99) and PPR (0.99) is pretty convincing; it sufficiently demonstrates the empirical difference between them. Also, the feedback clarifies the theoretical insights behind IPR again.

1. originality The theoretical results seem to be an improvement over existing results, however, the assumption still seems strong to me. For example, it requires log n * d_max = o(d^2_min) and d_min = w(log n) which does not hold if d_max=log^2 n and d_min = log_n. It means the result only apply on dense graph and not allowing heterogeneous degrees 2. quality The submission seems technically sound, but I did not check the detailed proofs 3. clarity The paper is clearly written, however, a written algorithm (in supplementary if there is no enough space in main paper) on the proposed seed expansion method will make the paper easier to be referenced 4. significance This paper could help practitioners to better understand pagerank, the concentration result can inspire future work which may relax the assumptions

This paper is well written with theoretical and methodological improvements on GPR. (1) They derived non-asymptotic bounds for LPs and GPRs, which helps understand previous GPR-based methods. They also generalize the analysis of standard PR methods with fewer assumptions. (2) The proposed IPR method, where they take into account the variance of LPs when selecting weights in the first several steps of RW. They showed IPR has better performance for SE community detection, by simulations and real data on networks with different settings.