Paper ID: | 1853 |
---|---|

Title: | Communication-Optimal Distributed Clustering |

The paper considers the communication complexity of distributed clustering in two different models of communications: the point to point (message passing) and the broadcast (blackboard). Suppose there are n nodes to be clustered across s sites (partition of edges). In the message passing model, an algorithm is presented that takes about ns polylog(ns) bits of communication, and an algorithm is presented for the blackboard case, that takes about (n+s) polylog (ns) bits of communications. O(ns) and O(n+s) are also lower bounds on the respective models. Some other extension of these results are presented; as well as experimental verification over real datasets have been shown.

There is one point I did not understand. It is said in the description of experimental results: "For example, when we partition a graph with over 70 million edges (the Sculpture dataset) into 30 sites, only 6% of the input edges are communicated in the blackboard model and 8% are communicated in the message passing model, while the values of the normalized cut (the objective function of spectral clustering) given in those two models are at most 2% larger than the one given by the centralized algorithm, and the visualized results are almost identical." This somehow does not agree with the theoretical result that there should be huge advantage of the blackboard model (n+s compared to ns). Why is this discrepancy?

2-Confident (read it all; understood it all reasonably well)

The paper proposes several distributed clustering algorithms and study the communication complexity in two models (point to point and blackboard). The authors prove lower and upper bounds for graph spectral clustering, and geometric clustering (i.e. k-means, k-median,...). The algorithms are implemented and evaluated on large graphs (the edge size is larger than 70 million).

In this paper, a large dataset is maintained in s sites that communicate in a point to point protocol (private message passing between sites and one coordinator) or in a blackboard protocol (broadcast messages to a centralized site). The proposed algorithm for spectral graph clustering within the message passing model is based on spectral sparsification. The contribution is to show the the sparsification can be computed locally. The proof that the clustering the union of the sparse subgraphs is close to the clustering on the whole graph highly relies on [19, Peng et al, COLT 2015]. The lower bound is based on a reduction to the multiparty set disjointness problem. The algorithm in the Blackboard model relies on the iterative construction of a sequence of matrices of length O(log n) whose last element is a spectral sparsifier. The presentation of this algorithm is less clear than the previous one. In particular it is hard to follow what can be done locally and when the communication with the coordinator is needed (I think that a communication is needed at each step to compute \tau_i). In both cases, the proofs in appendix seem to be necessary to understand the algorithms. So the presentation could be improved. The experiments confirm the theoretical results. I think that the paper is interesting and deserves publication. Typos and technical remarks. - Lines 469/471: \gamma_u is \lambda_u. Line 475: I don't understand the append operator. - I think that a concluding Theorem could be added to Section 3.2. - It should be more clear if you announce sooner that the reduction will encode X_i^j=1 if item j *is not* in site i.

1-Less confident (might not have understood significant parts)

The paper presents communication optimal algorithms for clustering in different distributed models as well as an experimental evaluation. It works in different distributed models as well as providing lower bounds on communication complexity.

I really liked this paper. The problem is quite natural and the use of spectral sparsifiers in this distributed setting is a good idea. There are strong guarantees on the behavior and the experimental results are quite compelling.

2-Confident (read it all; understood it all reasonably well)

This paper explores the distributed graph and geometric clustering problems, for the message passing and blackboard communication models. The authors proposed new distributed graph clustering schemes for both message passing and blackboard models, and proved their optimality. For the distributed geometric clustering problem, the authors proved the optimality of the existing schemes for the message passing model and proposed a new scheme for the blackboard model that achieves an O(1)-approximation.

A major problem of this paper is its heavy notation, and it sometimes makes the paper really hard to follow. The authors grouped many definitions in the preliminaries section, and directly used them without referring back. Also, the paper should always be self-contained without the supplementary materials. Presenting the results completely in supplementary materials (e.g., the scheme for Theorem 4.3, Figure 6 and 7) is not acceptable. Some specific concerns and comments are given in the following. 1. The embedding F(v) in Line 170 is confusing. 2. E_i’ in Line 176 is not defined. 3. In Theorem 3.1, the communication cost is measured by the unit “words”, which is undefined and inconsistent with the previously used unit “bits”. 4. K^+ in Line 228 is not defined, and how does exactly building a chain of matrices in (4) relate to the algorithm of distributed graph clustering in blackboard model is not clear. 5. The second paragraph of Section 4.1 is very confusing. S_i is a subset of [n], then what is the characteristic vector of S_i? 6. Figures in experiments section have poor qualities. It is hard to read axis labels and legends.

2-Confident (read it all; understood it all reasonably well)

The authors prove lower bounds for communication costs for clustering tasks in a point-to-point and broadcast setting. The clustering tasks considered are clustering a graph that meets certain spectral "clusterability" properties, as well as three point clustering tasks (k-means, k-medians, k-centers). The authors then present algorithms for these tasks, and prove their algorithms are within poly-logarithmic factors of the communication costs lower bounds. Experimentally, the authors verify that their algorithms recover clusters well for both tasks and have lower communication costs than simply sending all data to one coordinator.

With 6.5pgs of technical supplementary material, this feels like an attempt to cram two papers into one. As a result it was difficult to read because definitions and explanations were split between the paper and the supp mat. I think splitting this into one paper that proves the lower bounds on the communication costs, and another that presents the clustering algorithms may make sense. The algorithm for solving the graph clustering task within the point-to-point communication model isn't clearly presented. Line 175 reads "every site P_i computes a linear-sized (1+\Theta(1))-spectral sparsifier H_i of G_i." It is unclear what \Theta(1) means. Is it 0.5, is it 10, or is it 1000? What value was used in the experiments section? In the experiments section only the graph clustering algorithm is validated using synthetic data sets. Do the spectral separation properties assumed to motivate the graph clustering algorithm hold in these datasets? What happens on real data sets? Furthermore it is stated that the edges are randomly partitioned among sites. In this case, is there variance in results? What happens if edges are partitioned in a adversarial manner?

1-Less confident (might not have understood significant parts)

The paper describes a way to decentralize cluster algorithms. They used a message passing model and a blackboard model. Experiment on three clustering datasets showed that the proposed algorithm is more communication efficient than the baseline cluster algorithm.

Technical quality: Experiments are appropriate but incomplete. It is not clear from the experiment section if the distributed clustering algorithms improves upon a centralized version. Proofs are sound. Novelty/originality: Novel method (as far as I can see) Potential impact or usefulness: The paper describes a way of making a cluster algorithm more communication efficient so impact or usefulness depends on the use of clustering algorithms in the community. Clarity and presentation: Conclusion missing! Very hard to follow.

1-Less confident (might not have understood significant parts)