__ Summary and Contributions__: The paper considers a graph clustering on directed graphs. The authors introduce a new notion of clustering objective denoted by flow ratio. For any ordered partition of vertex set V into k pairwise disjoint subset (S0, ..., Sk-1), the flow ratio of the partition is sum of the average flow (i.e. w(i, i+1)/Vol(Si)+Vol(Si+1)) along the path from S0 to Sk-1. The optimal clustering is the partitioning of V that maximizes the flow ratio among all possible partitions. The authors represent the directed graph using the Hermitian adjacency matrix. The main contribution of this work is proving that the optimal clustering is well-embedded with the first eigenvector of the corresponding Laplacian matrix. Equipped with this observation, authors design an algorithm to find clustering. The algorithm is very similar to the standard spectral clustering algorithms for undirected graph; It first estimates the embedding and then runs k-means on the corresponding embedding. They show that for graphs with few clusters (i.e., poly(log n)) the algorithm runs in nearly linear time. Moreover, they prove that the clustering returned by the algorithm has a small symmetric difference with optimal clustering where the recovery quality depends on the gap between the first and second eigenvalues of the Laplacian. Finally, the authors show how to improve the runtime of the algorithm by subsampling edges with probabilities proportional to the weight by degree.
In addition, they evaluate the performance of their algorithm on the stochastic block model and UN Comtrade Dataset.

__ Strengths__: This paper has a conceptual contribution in considering the flow ratio notion for clustering directed graphs. It is an initial step towards applying known spectral techniques for undirected graphs to the directed graphs with the help of complex Hermitian adjacency matrix. Although the algorithm and most of the theoretical analysis are very similar to the standard spectral clustering literature such as work by Peng, Sun and Zanetti, the novel observation of this work is embedding the points into the space of dimension 2 with the help of complex Hermitian matrix. This approach shows how to differentiate points from different clusters using angles, although to find the clustering for undirected graphs it is crucial to look at the subspace of dimension k that is spanned by the first k eigenvectors.

__ Weaknesses__: The authors succeed in showing that the standard spectral techniques can be applied to the directed graph clustering using flow ratio formulation. However, the notion of flow ratio is not well-motivated. It is not clear why one should consider a specific path between clusters. In other words, the relationship between pairwise different clusters is not captured in the objective function and it only takes care of consecutive terms which is a restrictive definition in my opinion. Moreover, since this is a new objective function for directed graph clustering the authors should justify the hardness of finding the optimal clustering, also they should compare it with other clustering objectives for directed graphs.
Regarding the experiments on stochastic block model, first, it's not clear how large the input graph is, also, the number of clusters (i.e, k) is a small constant, so it's not clear from the experiments if the sublinear algorithm scales very well on large graphs. Second, the number of different runs in Figure 1 seems to be very small which might not be enough to get a good concentration in the accuracy of the results. Finally, the authors evaluate the quality of their methods using the Adjusted Rand Index. It might be useful to report the quality of the generated output using other known evaluation methods for spectral clusterings such as inner and outer conductance, precision and recall.

__ Correctness__: Yes both theoretical claims and empirical methodology are correct.
In line 185 of the main file it's not clear for me why do you have (1+APT)(yk-1)? It shouldn't be APT*(yk-1)?

__ Clarity__: The paper is well-written, and the algorithms and proofs are well-explained.

__ Relation to Prior Work__: The authors clearly explain their theoretical contribution comparing to the previous works.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: After reading the authors' rebuttal, my evaluation remains the same.
The work studies clustering of directed graphs by spectral means. Traditional spectral clustering on undirected graphs seek to find dense clusters that induce small cuts. Instead here the goal is to find a directed structure, for example, a cluster that has mostly outgoing edges and one that has mostly incoming edges (so sparse clusters inducing large cuts are welcome). The authors introduce a notion of flow ratio and discuss algorithms to find a clustering that approximately maximize that ratio. The paper contains experiments where the aforementioned algorithms find patterns in international trade markets (e.g. clusters of crude oil exporting countries).

__ Strengths__: The topic is interesting and matches the NeurIPS audience. The techniques are nontrivial and yield interesting insights. The whole analysis and the algorithms seem theoretically well grounded. The spectral analysis is of interest on its own (I remark that in many points it bears similarities with previous work). Thus I believe the work connects nicely the original goal of clustering with the spectral analysis for directed graphs and with the resulting approximation algorithms. The experiments are interesting as well, and suggest that the proposed algorithm is indeed capable of finding the cluter structure that one would expect. In particular the authors recover the crude oil international market structure (say, exporters vs importers) using data involving 246 different countries and regions over several years.

__ Weaknesses__: The main weakness of the paper is that here and there the exposition is poor. In particular, it is hard to make the necessary connections and understand where the discussion is going. There are several technical results stated in a "standalone" form although they are clearly meant to be connected/used somewhere else. This makes it quite hard to understand the key ideas of the paper and in what they differ from previous work. In particular, many concepts and results are similar to [22] (Peng et al @ SICOMP 17).

__ Correctness__: I could not verify the proofs of the claims although the results make sense. There is at least one mathematical expression that I found suspicious since it is undefined (division by 0) when k=2 and G has a single edge or is bipartite (see detailed comments below). I do not know if this is only a corner case.

__ Clarity__: The paper is fairly well written, but as I said it is generally hard to understand where the discussion is heading. There are also a few sentences that are confusing or incorrect (for example, at the very beginning of the introduction the paper presents a complicate expression and none of the symbols involved were defined before). I think the presentation can be improved.

__ Relation to Prior Work__: Yes and no. The authors acknowledge previous work in the sense that they correctly point to the relevant papers. What leaves me in doubt is that it is not clear what is similar to what. For example, the authors introduce in Equation (3) a kind of indicator vector for the clusters. Similar things are standard, so this is an (interesting) variation, but this it is not written. (Specifically, the equation just above (3) is the standard normalized indicator vector, and (3) does a linear combination which is the novelty part). This leaves the non-specialist reader like me in doubt; what is conceptually new, and what is just an application of standard techniques?

__ Reproducibility__: Yes

__ Additional Feedback__: Detailed comments:
- line 42: "any set of vertices" is confusing, here there is a partition
- lines 43-35: I could not understand this expression as none of the involved symbols has been introduced except S_j
- lines 73-74: "in particular we have vol(G) = vol(V) = 2m" this is not true since you define d() as weights instead of degrees
- lines 98-99: "the fraction of directed edges with endpoints in S_j or S_{j-1} which cross the cut from S_j and S_{j-1}" this sounds like the direction of the edge is irrelevant (it is also oddly phrased)
- line 122: this expression seems to have a 0 denominator, for example when k=2 and G is a single edge; this is strange
- lines 138-139: "we propose to apply k-means on the embedded points from f_1" I do not understand this sentence, in particular "from f_1" and what points the sentence is talking about
- S 4.2: I cannot understand how the three lemmata imply the theorem; I see they are related, but they are just listed without explaining the connection

__ Summary and Contributions__: This paper studies k-way partitions of directed graphs using k (complex) eigenvectors of the directed Laplacian. It gives bounds similar to those for k-way partitions of undirected graphs, and demonstrates good performances of the algorithms experimentally on small synthetic and real world graphs.

__ Strengths__: Extending spectral clustering into directed graphs is a difficult task: it's quite surprising to me that spectral techniques and their guarantees can be readily extended to directed graphs. The experimental verifications of the algorithms are fairly through.

__ Weaknesses__: Aside from the extension to the complex domain, it's not clear what are the new ideas in this paper compared to spectral methods for (k-way) partitioning undirected graphs. The experiments were mostly for small sized graphs, and it's not clear how effective the algorithm would scale to larger instances. Even the largest dataset for oil trading only involves < 300 vertices (but with about 100 commodities): I suspect the data becomes much smaller after the input is converted to graph form. If the data size is indeed large, it would be useful to directly mention the sizes of the intermediate graphs generated.

__ Correctness__: I believe the algorithm and the bounds claimed. The experimental set up is sound, although I'd like to have seen some comparisons with other methods such as pagerank or matrix factorization based methods. While those methods are not geared toward the flow ratios studied here, I believe their output clusters can still be compared directly.

__ Clarity__: The paper introduces the problem and methods clearly, but I find some of the formal components a bit lacking: for example, k was assumed to be O(log^{c}n) on line 144 and omitted from subsequent bounds involving running times. A pseudo-code of the overall algorithm is also not present in sec 4: this made it difficult to figure out the overall algorithmic picture.

__ Relation to Prior Work__: Prior works, especially for spectral partitioning of directed graphs, are discussed systematically.

__ Reproducibility__: Yes

__ Additional Feedback__: I believe the paper has some interesting ideas, but have difficulty figuring out its improvements over previous results. I feel there are enough theoretical new ideas for the paper to be accepted, although also feel that the ideas are still an iteration away from being directly used on large networks.