Paper ID: | 7423 |
---|---|

Title: | Optimal Sampling and Clustering in the Stochastic Block Model |

This paper proposes the Adaptive Spectral Partition algorithm for stochastic block models. To the best of the reviewer's knowledge, this adaptive approach is novel for asymptotically exact recovery in the SBMs. Additionally, the authors have showed that compared to regular sampling, this adaptive sampling approach allow one to recover the clusters in a larger regime. The result is interesting to me. Does that imply one is "wasting" some information by using random sampling? The main results seems tight. The authors compare their adaptive sampling algorithm to the regular clustering algorithm on the symmetric SBM with two clusters experimentally. The paper is well written and easy to follow. The proofs employ hypothesis testing, and connect its error rate to the log-likelihood ratio of two constructed models. Overall I believe the paper is a good contribution to the SBM literature.

The paper proposes an optimal edge sampling strategy for exact clustering SBM networks which are an important baseline for clustering. The authors find the upper bound on the cluster recovery rate and use those to device a sampling strategy in a spectral algorithm. The authors provide rigorous analysis in their paper for an important problem. It is also a new approach which unlike prior work on SBM clustering, provides a recovery algorithm based on a sampling strategy and not just uses it as a mean to prove claims on clustering feasibility. There are two fundamental things that are problematic in this manuscript – its readability and numerical validation: 1. The readability of the paper: while the authors make some sincere effort to explain some of their main results, the reader is flooded but very many technical statements some of which are not always motivated or explained even intuitively. for example in the presentation fo the Main result in page 2 -x or x_ij are not defined yet. Also, many steps and quantities in the algorithm are not clearly motivate and explained e.g. the setting ot T and delta, p_ii and so on. Hence the reader find himself flooded with quantities which are not straightforward to understand 2. It is stated in the abstract that it is numerically shown that adaptive edge sampling is shown to improve over random. However, no numerical validation or any kind or experimental validation for the sampling strategy is presented in this paper. This is a necessity for such a paper providing an algorithmic solution to a benchmark problem. Post Review: the authors have addressed some of my concerns and intend to modify adequately the final version. I have raised my score accordingly.

I have to say I'm not an expert on theorectical adaptive sampling in the SBM. I didn't check the math details, and may not fully understand the math sigficance of the paper. But from my experience on research, I feel this paper is good. The writing is well organized and easy to follow. The studied problem of adaptive sampling is important. The results are comprehensive and promising. The proposed algorithm should be inspiring for practical application.