{"title": "A class of network models recoverable by spectral clustering", "book": "Advances in Neural Information Processing Systems", "page_first": 3285, "page_last": 3293, "abstract": "Finding communities in networks is a problem that remains difficult, in spite of the amount of attention it has recently received. The Stochastic Block-Model (SBM) is a generative model for graphs with communities for which, because of its simplicity, the theoretical understanding has advanced fast in recent years. In particular, there have been various results showing that simple versions of spectralclustering using the Normalized Laplacian of the graph can recoverthe communities almost perfectly with high probability. Here we show that essentially the same algorithm used for the SBM and for its extension called Degree-Corrected SBM, works on a wider class of Block-Models, which we call Preference Frame Models, with essentially the same guarantees. Moreover, the parametrization we introduce clearly exhibits the free parameters needed to specify this class of models, and results in bounds that expose with more clarity the parameters that control the recovery error in this model class.", "full_text": "A class of network models recoverable by spectral\n\nclustering\n\nYali Wan\n\nDepartment of Statistics\nUniversity of Washington\n\nSeattle, WA 98195-4322, USA\nyaliwan@washington.edu\n\nMarina Meil\u02d8a\n\nDepartment of Statistics\nUniversity of Washington\n\nSeattle, WA 98195-4322, USA\n\nmmp@stat.washington.edu\n\nAbstract\n\nFinding communities in networks is a problem that remains dif\ufb01cult, in spite of the\namount of attention it has recently received. The Stochastic Block-Model (SBM)\nis a generative model for graphs with \u201ccommunities\u201d for which, because of its\nsimplicity, the theoretical understanding has advanced fast in recent years. In par-\nticular, there have been various results showing that simple versions of spectral\nclustering using the Normalized Laplacian of the graph can recover the commu-\nnities almost perfectly with high probability. Here we show that essentially the\nsame algorithm used for the SBM and for its extension called Degree-Corrected\nSBM, works on a wider class of Block-Models, which we call Preference Frame\nModels, with essentially the same guarantees. Moreover, the parametrization we\nintroduce clearly exhibits the free parameters needed to specify this class of mod-\nels, and results in bounds that expose with more clarity the parameters that control\nthe recovery error in this model class.\n\n1\n\nIntroduction\n\nThere have been many recent advances in the recovery of communities in networks, under \u201cblock-\nmodel\u201d assumptions [19, 18, 9].\nIn particular, advances in recovering communities by spectral\nclustering algorithms. These have been extended to models including node-speci\ufb01c propensities.\nIn this paper, we argue that one can further expand the model class for which recovery by spectral\nclustering is possible, and describe a model that subsumes a number of existing models, which we\ncall the PFM. We show that under the PFM model, the communities can be recovered with small\nerror, w.h.p. Our results correspond to what [6] termed the \u201cweak recovery\u201d regime, in which w.h.p.\nthe fraction of nodes that are mislabeled is o(1) when n \u2192 \u221e.\n\n2 The Preference Frame Model of graphs with communities\n\nThis model embodies the assumption that interactions at the community level (which we will also\ncall macro level) can be quanti\ufb01ed by meaningful parameters. This general assumption underlies\nthe (p, q) and the related parameterizations of the SBM as well. We de\ufb01ne a preference frame to\nbe a graph with K nodes, one for each community, that encodes the connectivity pattern at the\ncommunity level by a (non-symmetric) stochastic matrix R. Formally, given [K] = {1, . . . K}, a\nK \u00d7 K matrix R (det(R) (cid:54)= 0) representing the transition matrix of a reversible Markov chain on\n[K], the weighted graph H = ([K], R), with edge set supp R (edges correspond to entries in R not\nbeing 0) is called a K-preference frame. Requiring reversibility is equivalent to requiring that there\nis a set of symmetric weights on the edges from which R can be derived ([17]). We note that without\nthe reversibility assumption, we would be modeling directed graphs, which we will leave for future\n\n1\n\n\fwork. We denote by \u03c1 the left principal eigenvector of R, satisfying \u03c1T R = \u03c1T . W.l.o.g. we can\nassume the eigenvalue 1 or R has multiplicity 11 and therefore we call \u03c1 the stationary distribution\nof R.\nWe say that a deterministic weighted graph G = (V, S) with weight matrix S (and edge set supp S)\nadmits a K-preference frame H = ([K], R) if and only if there exists a partition C of the nodes V\ninto K clusters C = {C1, . . . Ck} of sizes n1, . . . , nK, respectively, so that the Markov chain on V\nwith transition matrix P determined by S satis\ufb01es the linear constraints\n\n(cid:88)\n\nj\u2208Cm\n\nPij = Rlm for all i \u2208 Cl, and all cluster indices l, m \u2208 {1, 2, . . . k}.\n\n(1)\n\ni=1 Sij.\n\nThe matrix P is obtained from S by the standard row-normalization P = D\u22121S where D =\n\ndiag{d1:n}, di =(cid:80)n\npairs i (cid:54)= j. We denote a realization from this process by A. Furthermore, let \u02c6di =(cid:80)\n\nA random graph family over node set V admits a K-preference frame H, and is called a Preference\nFrame Model (PFM), if the edges i, j, i < j are sampled independently from Bernoulli distributions\nwith parameters Sij. It is assumed that the edges obtained are undirected and that Sij \u2264 1 for all\nj\u2208V Aij and\nin general, throughout this paper, we will denote computable quantities derived from the observed\nA with the same letter as their model counterparts, decorated with the \u201chat\u201d symbol. Thus, \u02c6D =\ndiag \u02c6d1:n, \u02c6P = \u02c6D\u22121A, and so on.\nOne question we will study is under what conditions the PFM model can be estimated from a given A\nby a standard spectral clustering algorithms. Evidently, the dif\ufb01cult part in this estimation problem is\nrecovering the partition C. If this is obtained correctly, the remaining parameters are easily estimated\nin a Maximum Likelihood framework.\nBut another question we elucidate refers to the parametrization itself. It is known that in the SBM\nand Degree Corrected-SBM (DC-SBM) [18], in spite of their simplicity, there are dependencies\nbetween the community level \u201cintensive\u201d parameters and the graph level \u201cextensive\u201dparameters, as\nwe will show below. In the parametrization of the PFM , we can explicitly show which are the free\nparameters and which are the dependent ones.\nSeveral network models in wide use admit a preference frame. For example, the SBM(B) model,\nwhich we brie\ufb02y describe here. This model has parameters the cluster sizes (n1:K) and the con-\nnectivity matrix B \u2208 [0, 1]K\u00d7K. For two nodes i, j \u2208 V, the probability of an edge (i, j) is Bkl\niff i \u2208 Ck and j \u2208 Cl. The matrix B needs not be symmetric. When Bkk = p, Bkl = q for\nk, l \u2208 [K], k (cid:54)= l, the model is denoted SBM(p, q). It is easy to verify that the SBM admits a\npreference frame. For instance, in the case of SBM(p, q), we have\n\ndi = p(nl \u2212 1) + q(n \u2212 nl) \u2261 dCl , for i \u2208 Cl,\nqnm\ndCl\n\nif l (cid:54)= m, Rl,l =\n\np(nl \u2212 1)\n\nIn the above we have introduced the notation dCl = (cid:80)\n\nRl,m =\n\ndCl\n\n, for l, m \u2208 {1, 2, . . . , k}.\n\ndi. One particular realization of the\nPFM is the Homogeneous K-Preference Frame model (HPFM). In a HPFM, each node i \u2208 V is\ncharacterized by a weight, or propensity to form ties wi. For each pair of communities l, m with\nl \u2264 m and for each i \u2208 Cl, j \u2208 Cm we sample Aij with probability Sij given by\n\nj\u2208Cl\n\nSij =\n\nRmlwiwj\n\n\u03c1l\n\n.\n\n(2)\n\nThis formulation ensures detail balance in the edge expectations, i.e. Sij = Sji. The HPFM is\nvirtually equivalent to what is known as the \u201cdegree model\u201d [8] or \u201cDC-SBM\u201d, up to a reparam-\neterization2. Proposition 1 relates the node weights to the expected node degrees di. We note\nthat the main result we prove in this paper uses independent sampling of edges only to prove the\nconcentration of the laplacian matrix. The PFM model can be easily extended to other graph models\n\n1Otherwise the networks obtained would be disconnected.\n2Here we follow the customary de\ufb01nition of this model, which does not enforce Sii = 0, even though this\n\nimplies a non-zero probability of self-loops.\n\n2\n\n\fwith dependent edges if one could prove concentration and eigenvalue separation. For example,\nwhen R has rational entries, the subgraph induced by each block of A can be represented by a\nrandom d-regular graph with a speci\ufb01ed degree.\n\nProposition 1 In a HPFM di = wi\n\n(cid:80)K\n\nl=1 Rkl\n\nwhenever i \u2208 Ck and k \u2208 [K].\n\nwCl\n\u03c1l\n\nEquivalent statements that the expected degrees in each cluster are proportional to the weights exist\nin [7, 19] and they are instrumental in analyzing this model. This particular parametrization imme-\ndiately implies in what case the degrees are globally proportional to the weights. This is, obviously,\nthe situation when wCl \u221d \u03c1l for all l \u2208 [K].\nAs we see, the node degrees in a HPFM are not directly determined by the propensities wi, but\ndepend on those by a multiplicative constant that varies with the cluster. This type of interaction\nbetween parameters has been observed in practically all extensions of the Stochastic Block-Model\nthat we are aware of, making parameter interpretation more dif\ufb01cult. Our following result establishes\nwhat are the free parameters of the PFM and of their subclasses. As it will turn out, these parameters\nand their interactions are easily interpretable.\n\nProposition 2 Let (n1, . . . nK) be a partition of n (assumed to represent the cluster sizes of C =\n{C1, . . . CK} a partition of node set V), R a non-singular K \u00d7 K stochastic matrix, \u03c1 its left\nprincipal eigenvector, and \u03c0C1 \u2208 [0, 1]n1, . . . \u03c0CK \u2208 [0, 1]nK probability distributions over C1:K.\nThen, there exists a PFM consistent with H = ([K], R), with clustering C, and whose node degrees\nare given by\n\nwhenever i \u2208 Ck, where dtot = (cid:80)\n\nAssumption 2.\n\ndi = dtot\u03c1k\u03c0Ck,i,\n\n(3)\n\ni\u2208V di is a user parameter which is only restricted above by\n\nThe proof of this result is constructive, and can be found in the extended version.\nThe parametrization shows to what extent one can specify independently the degree distribution of a\nnetwork model, and the connectivity parameters R. Moreover, it describes the pattern of connection\nof a node i as a composition of a macro-level pattern, which gives the total probability of i to\nform connections with a cluster l, and the micro-level distribution of connections between i and the\nmembers of Cl. These parameters are meaningful on their own and can be speci\ufb01ed or estimated\nseparately, as they have no hidden dependence on each other or on n, K.\nThe PFM enjoys a number of other interesting properties. As this paper will show, almost all the\nproperties that make SBM\u2019s popular and easy to understand hold also for the much more \ufb02exible\nPFM. In the remainder of this paper we derive recovery guarantees for the PFM. As an additional\ngoal, we will show that in the frame we set with the PFM, the recovery conditions become clearer,\nmore interpretable, and occasionally less restrictive than for other models.\nAs already mentioned, the PFM includes many models that have been found useful by previous\nauthors. Yet, the PFM class is much more \ufb02exible than those individual models, in the sense that\nit allows other unexplored degrees of freedom (or, in other words, achieves the same advantages as\npreviously studied models with fewer constraints on the data). Note that there is an in\ufb01nite number\nof possible random graphs G with the same parameters (d1:n, n1:k, R) satisfying the constraints (1)\nand Proposition 2, yet for reliable community detection we do not need to control S fully, but only\n\naggregate statistics like(cid:80)\n\nj\u2208C Aij.\n\n3 Spectral clustering algorithm and main result\n\nNow, we address the community recovery problem from a random graph (V, A) sampled from\nthe PFM de\ufb01ned as above. We make the standard assumption that K is known. Our analysis is\n\n3\n\n\f: Graph (V, A) with |V| = n and A \u2208 {0, 1}n\u00d7n, number of clusters K\n\nbased on a very common spectral clustering algorithm used in [13] and described also in [14, 21].\nInput\nOutput: Clustering C\n1. Compute \u02c6D = diag( \u02c6d1,\u00b7\u00b7\u00b7 , \u02c6dn) and Laplacian\n\n\u02c6L = \u02c6D\u22121/2A \u02c6D\u22121/2\n\n(4)\n2. Calculate the K eigenvectors \u02c6Y1,\u00b7\u00b7\u00b7 , \u02c6YK associated with the K eigenvalues |\u02c6\u03bb1| \u2265 \u00b7\u00b7\u00b7 \u2265 |\u02c6\u03bbK|\nof \u02c6L. Normalize the eigenvectors to unit length. We denote them as the \ufb01rst K eigenvectors in the\nfollowing text;\n3. Set \u02c6Vi = \u02c6D\u22121/2 \u02c6Yi, i = 1,\u00b7\u00b7\u00b7 , K. Form matrix \u02c6V = [ \u02c6V1 \u00b7\u00b7\u00b7 \u02c6VK];\n4. Treating each row of \u02c6V as a point in K dimensions, cluster them by the K-means algorithm to\nobtain the clustering \u02c6C.\n\nAlgorithm 1: Spectral Clustering\n\nNote that the vectors \u02c6V are the \ufb01rst K eigenvectors of P . The K-means algorithm is assumed to \ufb01nd\nthe global optimum. For more details on good initializations for K-means in step 4 see [16].\nWe quantify the difference between \u02c6C and the true clusterings C by the mis-clustering rate perr,\nwhich is de\ufb01ned as\n\nperr = 1 \u2212 1\nn\n\nmax\n\n\u03c6:[K]\u2192[K]\n\n|C\u03c6(k) \u2229 \u02c6Ck|.\n\n(5)\n\n(cid:88)\n\nk\n\nTheorem 3 (Mis-clustering rate bound for HPFM and PFM) Let the n \u00d7 n matrix S admit a\nPFM, and w1:n, R, \u03c1, P, A, d1:n have the usual meaning, and let \u03bb1:n be the eigenvalues of P ,\nwith |\u03bbi| \u2265 |\u03bbi+1|. Let dmin = min d1:n be the minimum expected degree, \u02c6dmin = min \u02c6di, and\ndmax = maxij nSij. Let \u03b3 \u2265 1, \u03b7 > 0 be arbitrary numbers. Assume:\nAssumption 1 S admits a HPFM model and (2) holds.\nAssumption 2 Sij \u2264 1\nAssumption 3 \u02c6dmin \u2265 log(n)\nAssumption 4 dmin \u2265 log(n)\nAssumption 5 \u2203\u03ba > 0, dmax \u2264 \u03ba log n\nAssumption 6 grow > 0, where grow is de\ufb01ned in Proposition 4.\nAssumption 7 \u03bb1:K are the eigenvalues of R, and |\u03bbK| \u2212 |\u03bbK+1| = \u03c3 > 0.\n\nWe also assume that we run Algorithm 1 on S and that K-means \ufb01nds the optimal solution. Then,\nfor n suf\ufb01ciently large, the following statements hold with probability at least 1 \u2212 e\u2212\u03b3.\nPFM Assumptions 2 - 7 imply\n\nperr \u2264 Kdtot\n\nndmingrow\n\nHPFM Assumptions 1 - 6 imply\n\nperr \u2264 Kdtot\n\nndmingrow\nwhere C0 is a constant depending on \u03ba and \u03b3.\n\n(cid:20) C0\u03b34\n(cid:20) C0\u03b34\n\n\u03c32 log n\n\n\u03bb2\nK log n\n\n(cid:21)\n(cid:21)\n\n+\n\n4(log n)\u03b7\n\n\u02c6dmin\n\n+\n\n4(log n)\u03b7\n\n\u02c6dmin\n\n(6)\n\n(7)\n\nNote that perr decreases at least as 1/ log(n) when \u02c6dmin = dmin = log(n). This is because \u02c6dmin\nand dmin help with the concentration of L. Using Proposition 4, the distances between rows of V ,\n\n4\n\n\fi.e, the true centers of the k-means step, are lower bounded by grow/dtot. After plugging in the\nassumptions for dmin, \u02c6dmin, dmax, we obtain\n\n(cid:20) C0\u03b34\n\n\u03c32 log n\n\nperr \u2264 K\u03ba\n\ngrow\n\n(cid:21)\n\n+\n\n4\n\n(log n)(1\u2212\u03b7)\n\n.\n\n(8)\n\nWhen n is small, the \ufb01rst component on the right hand side dominates because of the constant C0,\nwhile the second part dominates when n is very large. This shows that perr decreases almost as\n1/ log n. Of the remaining quantities, \u03ba controls the spread of the degrees di. Notice that \u03bbK and\n\u03c3 are eigengaps in HPFM model and PFM model respectively and depend only on the preference\nframe, and likewise for grow. The eigengaps ensure the stability of principal spaces and the sepa-\nration from the spurious eigenvalues, as shown in Proposition 6. The term containing (log n)\u03b7 is\ndesigned to control the difference between di and \u02c6di with \u03b7 a small positive constant.\n\n3.1 Proof outline, techniques and main concepts\n\nThe proof of Theorem 3 (given in the extended version of the paper) relies on three steps, which\nare to be found in most results dealing with spectral clustering. First, concentration bounds of\nthe empirical Laplacian \u02c6L w.r.t L are obtained. There are various conditions under which these\ncan be obtained, and ours are most similar to the recent result of [9]. The other tools we use are\nHoeffding bounds and tools from linear algebra. Second, one needs to bound the perturbation of\nthe eigenvectors Y as a function of the perturbation in L. This is based on the pivotal results of\nDavis and Kahan, see e.g [18]. A crucial ingredient in these type of theorems is the size of the\neigengap between the invariant subspace Y and its orthogonal complement. This is a condition that\nis model-dependent, and therefore we discuss the techniques we introduce for solving this problem\nin the PFM in the next subsection.\nThe third step is to bound the error of the K-means clustering algorithm. This is done by a counting\nargument. The crux of this step is to ensure the separation of the K distinct rows of V . This, again, is\nmodel dependent and we present our result below. The details and proof are in the extended version.\nAll proofs are for the PFM; to specialize to the HPFM, one replaces \u03c3 with |\u03bbK|\n\n3.2 Cluster separation and bounding the spurious eigenvalues in the PFM\n\nvolume dCk = (cid:80)\n\nProposition 4 (Cluster separation) Let V, \u03c1, d1:n have the usual meaning and de\ufb01ne the cluster\n. Let i, j \u2208 V be nodes belonging\nrespectively to clusters k, m with k (cid:54)= m. Then,\n\ndi, and cmax, cmin as maxk, mink\n\ni\u2208Ck\n\ndCk\nn\u03c1k\n\n(cid:18) 1\n(cid:20) 1\n(cid:17) \u2212 1\u221a\n\ncmax\n\n\u03c1k\n\n(cid:19)\n(cid:16) 1\n\n1\n\u03c1m\n\n+\n\n\u2212\n\n1\u221a\n\u03c1k\u03c1m\n\n(cid:17)(cid:105)\n\n\u2212 1\n\ncmax\n\n||Vi: \u2212 Vj:||2 \u2265 1\ndtot\n\n(cid:104) 1\n\n(cid:16) 1\n\n\u03c1k\n\n(cid:18) 1\n\n(cid:19)(cid:21)\n\n\u2212 1\ncmax\n\ncmin\n\n=\n\ngrow\ndtot\n\n,\n\n(9)\n\nwhere grow =\nnormalized to length 1, the above result holds by replacing dtotcmax,min with max, mink\n\n. Moreover, if the columns of V are\n\n+ 1\n\u03c1m\n\n\u03c1k\u03c1m\n\ncmax\n\ncmin\n\n.\n\nnk\n\u03c1k\n\nIn the square brackets, cmax,min depend on the cluster-level degree distribution, while all the other\nquantitities depend only of the preference frame. Hence, this expression is invariant with n, and as\nlong as it is strictly positive, we have that the cluster separation is \u2126(1/dtot).\nThe next theorem is crucial in proving that L has a constant eigengap. We express the eigengap of P\nin terms of the preference frame H and the mixing inside each of the clusters Ck. For this, we resort\nto generalized stochastic matrices, i.e. rectangular positive matrices with equal row sums, and we\nrelate their properties to the mixing of Markov chains on bipartite graphs.\nThese tools are introduced here, for the sake of intuition, toghether with the main spectral result,\nwhile the rest of the proofs are in the extended version.\nGiven C, for any vector x \u2208 Rn, we denote by xk, k = 1, . . . K, the block of x indexed by elements\nof cluster k of C. Similarly, for any square matrix A \u2208 Rn\u00d7n, we denote by Akl = [Aij]i\u2208k,j\u2208l the\nblock with rows indexed by i \u2208 k, and columns indexed by j \u2208 l.\n\n5\n\n\fDenote by \u03c1, \u03bb1:K, \u03bd1:K \u2208 RK respectively the stationary distribution, eigenvalues3, and eigenvec-\ntors of R.\nWe are interested in block stochastic matrices P for which the eigenvalues of R are the principal\neigenvalues. We call \u03bbK+1 . . . \u03bbn spurious eigenvalues. Theorem 6 below is a suf\ufb01cient condition\nthat bounds |\u03bbK+1| whenever each of the K 2 blocks of P is \u201dhomogeneous\u201d in a sense that will be\nde\ufb01ned below.\nWhen we consider the matrix L = D\u22121/2SD\u22121/2 partitioned according to C, it will be convenient\nto consider the off-diagonal blocks in pairs. This is why the next result describes the properties of\nmatrices consisting of a pair of off-diagonal blocks.\n\nProposition 5 (Eigenvalues for the off-diagonal blocks) Let M be the square matrix\n\nM =\n\n(cid:20) 0 B\n(cid:21)\n(cid:20) x1\n\nA 0\n\nx2\n\n(cid:21)\n\n(10)\n\n, x1,2 \u2208 Cn1,2 be an eigenvector of M\n\nwhere A \u2208 Rn2\u00d7n1 and B \u2208 Rn1\u00d7n2, and let x =\nwith eigenvalue \u03bb. Then\n\nBx2 = \u03bbx1\nAx1 = \u03bbx2\n\nABx2 = \u03bb2x2\nBAx1 = \u03bb2x1\n\n(cid:20) BA\n\n(cid:21)\n\n0\nAB\n\n(11)\n(12)\n\n0\n\nM 2 =\n\n1 \u2212 xT\n\n(13)\nMoreover, if M is symmetric, i.e B = AT , then \u03bb is a singular value of A, x is real, and \u2212\u03bb is\n2 ]T . Assuming n2 \u2264 n1, and that A is full rank,\nalso an eigenvalue of M with eigenvector [xT\none can write A = V \u039bU T with V \u2208 Rn2\u00d7n2, U \u2208 Rn1\u00d7n2 orthogonal matrices, and \u039b a diagonal\nmatrix of non-zero singular values.\nTheorem 6 (Bounding the spurious eigenvalues of L) Let C, L, P, D, S, R, \u03c1 be de\ufb01ned as above,\nand let \u03bb be an eigenvalue of P . Assume that (1) P is block-stochastic with respect to C; (2) \u03bb1:K are\nthe eigenvalues of R, and |\u03bbK| > 0; (3) \u03bb is not an eigenvalue of R; (4) denote by \u03bbkl\n3 (\u03bbkk\n2 ) the third\n|\u03bbkl\n3 |\n\u03bbmax(Mkl) \u2264 c < 1\n(second) largest in magnitude eigenvalue of block Mkl (Lkk) and assume that\n\u03bbmax(Lkk) \u2264 c). Then, the spurious eigenvalues of P are bounded by c times a constant that\n(\ndepends only on R.\n\n|\u03bbkk\n2 |\n\n\uf8eb\uf8edrkk +\n\n(cid:88)\n\nl(cid:54)=k\n\n\u221a\n\nrklrlk\n\n|\u03bb| \u2264 c max\n\nk=1:K\n\nRemarks: The factor that multiplies c can be further bounded denoting a = [\n\u221a\n[\n\nrlk]T\n\nl=1:K\n\n\uf8f6\uf8f8\n\nK(cid:88)\n\n(14)\n\nrkl]T\n\nl=1:K, b =\n\n\u221a\n\nrlk\n\n(15)\n\n(16)\n\n(cid:118)(cid:117)(cid:117)(cid:116) K(cid:88)\n\nrkl\n\nrlk =\n\nl=1\n\nl=1\n\n(cid:118)(cid:117)(cid:117)(cid:116) K(cid:88)\n(cid:118)(cid:117)(cid:117)(cid:116) K(cid:88)\n\nrlk\n\nl=1\n\nl=1\n\n(cid:88)\n\nl(cid:54)=k\n\n\u221a\n\nrkk +\n\nIn other words,\n\nrklrlk = aT b \u2264 ||a||||b|| =\n\n|\u03bb| \u2264 c\n2\n\nmax\nk=1:K\n\nThe maximum column sum of a stochastic matrix is 1 if the matrix is doubly stochastic and larger\nK. However, one must remember that the interesting R\nthan 1 otherwise, and can be as large as\nmatrices have \u201clarge\u201d eigenvalues. In particular we will be interested in \u03bbK > c. It is expected that\nunder these conditions, the factor depending on R to be close to 1.\n\n\u221a\n\n3Here too, eigenvalues will always be ordered in decreasing order of their magnitudes, with positive values\n\npreceeding negatives one of the same magnitude. Consequently, for any stochastic matrix, \u03bb1 = 1 always\n\n6\n\n\fThe second remark is on the condition (3), that all blocks have small spurious eigenvalues. This\ncondition is not merely a technical convenience. If a block had a large eigenvalue, near 1 or \u22121\n(times its \u03bbmax), then that block could itself be broken into two distinct clusters. In other words, the\nclustering C would not accurately capture the cluster structure of the matrix P . Hence, condition (3)\namounts to requiring that no other cluster structure is present, in other words that within each block,\nthe Markov chain induced by P mixes well.\n\n4 Related work\n\nPrevious results we used The Laplacian concentration results use a technique introduced recently\nby [9], and some of the basic matrix theoretic results are based on [14] which studied the P and L\nmatrix in the context of spectral clustering. As any of the many works we cite, we are indebted to\nthe pioneering work on the perturbation of invariant subspaces of Davis and Kahan [18, 19, 20].\n\n4.1 Previous related models\n\nThe con\ufb01guration model for regular random graphs [4, 11] and for graphs with general \ufb01xed degrees\n[10, 12] is very well known. It can be shown by a simple calculation that the con\ufb01guration model\nalso admits a K-preference frame. In the particular case when the diagonal of the R matrix is 0 and\nthe connections between clusters are given by a bipartite con\ufb01guration model with \ufb01xed degrees,\nK-preference frames have been studied by [15] under the name \u201cequitable graphs\u201d; the object there\nwas to provide a way to calculate the spectrum of the graph.\nSince the PFM is itself an extension of the SBM, many other extensions of the latter will bear\nresemblance to PFM. Here we review only a subset of these, a series of strong relatively recent\nadvances, which exploit the spectral properties of the SBM and extend this to handle a large range\nof degree distributions [7, 19, 5]. The PFM includes each of these models as a subclass4.\nIn [7] the authors study a model that coincides (up to some multiplicative constants) with the HPFM.\nThe paper introduces an elegant algorithm that achieves partial recovery or better, which is based\non the spectral properties of a random Laplacian-like matrix, and does not require knowledge of the\npartition size K.\nThe PFM also coincides with the model of [1] and [8] called the expected degree model w.r.t the\ndistribution of intra-cluster edges, but not w.r.t the ambient edges, so the HPFM is a subclass of this\nmodel.\nA different approach to recovery The papers [5, 18, 9] propose regularizing the normalized Lapla-\ncian with respect to the in\ufb02uence of low degrees, by adding the scaled unit matrix \u03c4 I to the incidence\nmatrix A, and thereby they achieve recovery for much more imbalanced degree distributions than\nus. Currently, we do not see an application of this interesting technique to the PFM, as the diagonal\nregularization destroys the separation of the intracluster and intercluster transitions, which guaran-\ntee the clustering property of the eigenvectors. Therefore, currently we cannot break the n log n\nlimit into the ultra-sparse regime, although we recognize that this is an important current direction\nof research.\nRecovery results like ours can be easily extended to weighted, non-random graphs, and in this sense\nthey are relevant to the spectral clustering of these graphs, when they are assumed to be noisy\nversions of a G that admits a PFM.\n\n4.2 An empirical comparison of the recovery conditions\n\nAs obtaining general results in comparing the various recovery conditions in the literature would be\na tedious task, here we undertake to do a numerical comparison. While the conclusions drawn from\nthis are not universal, they illustrate well the stringency of various conditions, as well as the gap\nbetween theory and actual recovery. For this, we construct HPFM models, and verify numerically if\nthey satisfy the various conditions. We have also clustered random graphs sampled from this model,\nwith good results (shown in the extended version).\n\n4In particular, the models proposed in [7, 19, 5] are variations of the DC-SBM and thus forms of the\n\nhomogeneous PFM.\n\n7\n\n\fWe generate S from the HPFM model with K = 5, n = 5000. Each wi is uniformly generated\nfrom (0.5, 1). n1:K = (500, 1000, 1500, 1000, 1000), grow > 0, \u03bb1:K = (1, 0.8, 0.6, 0.4, 0.2). The\n\nmatrix R is given below; note its last row in which r55 <(cid:80)4\n\n\uf8eb\uf8ec\uf8ec\uf8ec\uf8ed\n\n.80\n.04\n.01\n.01\n.13\n\nR =\n\n.07\n.52\n.20\n.08\n.21\n\n.02\n.24\n.65\n.12\n.02\n\n.02\n.12\n.15\n.70\n.32\n\n.09\n.08\n.00\n.08\n.33\n\nl=1 r5l.\n\n\uf8f6\uf8f7\uf8f7\uf8f7\uf8f8 \u03c1 = (.25, .44, .54, .65, .17).\n\n(17)\n\ndmin\n\nThe conditions we are verifying include besides ours, those obtained by [18], [19], [3] and [5];\nsince the original S is a perfect case for spectral clustering of weighted graphs, we also verify the\ntheoretical recovery conditions for spectral clustering in [2] and [16].\nOur result Theorem 3 Assumption 1 and 2 automatically hold from the construction of the data.\nBy simulating the data, We \ufb01nd that dmin = 77.4, \u02c6dmin = 63, both of which are bigger than\nlog n = 8.52. Therefore Assumption 3 and 4 hold. dmax = 509.3, grow = 1.82 > 0, thus Assump-\ntion 5 and 6 hold. After running Algorithm 1, the mis-clustering rate is r = 0.0008, which satis\ufb01es\nthe theoretical bound. In conclusion, the dataset \ufb01ts into both the assumptions and conclusion of\nTheorem 3.\n\u03bbK \u2265\nQin and Rohe[18] This paper has an assumption on the lower bound on \u03bbK, that is\n, so that the concentration bound holds with probability (1 \u2212 \u0001). We set \u0001 = 0.1 and\n\n(cid:113) K(ln(K/\u0001)\n\n\u221a\n1\n8\n\n3\n\nn , and requires \u03c4 2\n\n\u00011 \u2265 maxi1,i2\u2208{1,\u00b7\u00b7\u00b7 ,K}(cid:80)\n\n2)\u0001, \u0001 =(cid:112)K(K \u2212 1)\u00011 + K\u00012\n\nobtain \u03bbK \u2265 12.3, which is impossible to hold since \u03bbK is upper bounded by 15.\nRohe, Chatterjee, Yu[19] Here, one de\ufb01nes \u03c4n = dmin\nn log n > 2 to ensure the\nconcentration of L. To meet this assumption, with n = 5000, dmin \u2265 2422. While in our case\ndmin = 77.4. The assumption requires a very dense graph and is not satis\ufb01ed in this dataset.\nBalcan, Borgs Braverman, Chayes[3]Their theorem is based on self-determined community struc-\nture. It requires all the nodes to be more connected within their own cluster. However, in our graph,\n1296 out of 5000 nodes have more connections to outside nodes than to nodes in their own cluster.\nNg, Jordan, Weiss[16] require \u03bb2 < 1 \u2212 \u03b4, where \u03b4 > (2 + 2\n2,\n)1/2.\nOn the given data, we \ufb01nd that \u0001 \u2265 36.69, and \u03b4 \u2265 125.28, which is impossible to hold since \u03b4\nneeds to be smaller than 1.\nChaudhuri, Chung, Tsiatas[5] The recovery theorem of this paper requires di \u2265 128\n9 ln(6n/\u03b4),\nso that when all the assumptions hold, it recovers the clustering correctly with probability at least\n1 \u2212 6\u03b4. We set \u03b4 = 0.01, and obtain that di = 77.40, 128\n9 ln(6n/\u03b4) = 212.11. Therefore the\nassumption fails as well.\nFor our method, the hardest condition to satisfy, and the most different from the others, was Assump-\ntion 6. We repeated this experiment with the other weights distributions for which this assumption\nfails. The assumptions in the related papers continued to be violated. In [Qin and Rohe], we obtain\n\u03bbK \u2265 17.32. In [Rohe, Chatterjee, Yu], we still needs dmin \u2265 2422. In [Balcan, Borgs Braverman,\nChayes], we get 1609 points more connected to the outside nodes of its cluster. In [Balakrishnan,\nXu, Krishnamurthy, Singh], we get \u03c3 = 0.172 and needs to satisfy \u03c3 = o(0.3292). In [Ng, Jordan,\nWeiss], we obtain \u03b4 \u2265 175.35. Therefore, the assumptions in these papers are all violated as well.\n\n, \u00012 \u2265 maxi\u2208{1,\u00b7\u00b7\u00b7 ,K}\n\n((cid:80)\n\nk:k\u2208Si\n\u02c6dj\n\nj\u2208Ci1\n\nk\u2208Ci2\n\nk,l\u2208Si\n\nA2\nkl\n\u02c6dk\n\u02c6dl\n\n(cid:80)\n\nA2\njk\n\u02c6dj \u02c6dk\n\n\u221a\n\n(cid:80)\n\n5 Conclusion\n\nIn this paper, we have introduced the preference frame model, which is more \ufb02exible and subsumes\nmany current models including SBM and DC-SBM. It produces state-of-the art recovery rates com-\nparable to existing models. To accomplish this, we used a parametrization that is clearer and more\nintuitive. The theoretical results are based on the new geometric techniques which control the eigen-\ngaps of the matrices with piecewise constant eigenvectors.\nWe note that the main result theorem 3 uses independent sampling of edges only to prove the concen-\ntration of the laplacian matrix. The PFM model can be easily extended to other graph models with\ndependent edges if one could prove concentration and eigenvalue separation. For example, when\nR has rational entries, the subgraph induced by each block of A can be represented by a random\nd-regular graph with a speci\ufb01ed degree.\n\n5To make \u03bb \u2264 1 possible, one needs dmin \u2265 11718.\n\n8\n\n\fReferences\n[1] Sanjeev Arora, Rong Ge, Sushant Sachdeva, and Grant Schoenebeck. Finding overlapping\ncommunities in social networks: toward a rigorous approach. In Proceedings of the 13th ACM\nConference on Electronic Commerce, pages 37\u201354. ACM, 2012.\n\n[2] Sivaraman Balakrishnan, Min Xu, Akshay Krishnamurthy, and Aarti Singh. Noise thresholds\nfor spectral clustering. In Advances in Neural Information Processing Systems, pages 954\u2013962,\n2011.\n\n[3] Maria-Florina Balcan, Christian Borgs, Mark Braverman, Jennifer Chayes, and Shang-Hua\n\nTeng. Finding endogenously formed communities. arxiv preprint arXiv:1201.4899v2, 2012.\n\n[4] Bela Bollobas. Random Graphs. Cambridge University Press, second edition, 2001.\n[5] K. Chaudhuri, F. Chung, and A. Tsiatas. Spectral clustering of graphs with general degrees in\nextended planted partition model. Journal of Machine Learning Research, pages 1\u201323, 2012.\n[6] Yudong Chen and Jiaming Xu. Statistical-computational tradeoffs in planted problems and\nsubmatrix localization with a growing number of clusters and submatrices. arXiv preprint\narXiv:1402.1267, 2014.\n\n[7] Amin Coja-Oghlan and Andre Lanka. Finding planted partitions in random graphs with general\n\ndegree distributions. SIAM Journal on Discrete Mathematics, 23:1682\u20131714, 2009.\n[8] M. O. Jackson. Social and Economic Networks. Princeton University Press, 2008.\n[9] Can M. Le and Roman Vershynin. Concentration and regularization of random graphs. 2015.\n[10] Brendan McKay. Asymptotics for symmetric 0-1 matrices with prescribed row sums. Ars\n\nCombinatoria, 19A:15\u201326, 1985.\n\n[11] Brendan McKay and Nicholas Wormald. Uniform generation of random regular graphs of\n\nmoderate degree. Journal of Algorithms, 11:52\u201367, 1990.\n\n[12] Brendan McKay and Nicholas Wormald. Asymptotic enumeration by degree sequence of\n\ngraphs with degrees o(n1/2. Combinatorica, 11(4):369\u2013382, 1991.\n\n[13] Marina Meil\u02d8a and Jianbo Shi. Learning segmentation by random walks. In T. K. Leen, T. G.\nDietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems, vol-\nume 13, pages 873\u2013879, Cambridge, MA, 2001. MIT Press.\n\n[14] Marina Meil\u02d8a and Jianbo Shi. A random walks view of spectral segmentation. In T. Jaakkola\n\nand T. Richardson, editors, Arti\ufb01cial Intelligence and Statistics AISTATS, 2001.\n\n[15] M.E.J. Newman and Travis Martin. Equitable random graphs. 2014.\n[16] Andrew Y Ng, Michael I Jordan, Yair Weiss, et al. On spectral clustering: Analysis and an\n\nalgorithm. Advances in neural information processing systems, 2:849\u2013856, 2002.\n\n[17] J.R. Norris. Markov Chains. Cambridge University Press, 1997.\n[18] Tai Qin and Karl Rohe. Regularized spectral clustering under the degree-corrected stochastic\nblockmodel. In Advances in Neural Information Processing Systems, pages 3120\u20133128, 2013.\n[19] Karl Rohe, Sourav Chatterjee, Bin Yu, et al. Spectral clustering and the high-dimensional\n\nstochastic blockmodel. The Annals of Statistics, 39(4):1878\u20131915, 2011.\n\n[20] Gilbert W Stewart, Ji-guang Sun, and Harcourt Brace Jovanovich. Matrix perturbation theory,\n\nvolume 175. Academic press New York, 1990.\n\n[21] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395\u2013\n\n416, 2007.\n\n9\n\n\f", "award": [], "sourceid": 1819, "authors": [{"given_name": "Yali", "family_name": "Wan", "institution": "University of Washington"}, {"given_name": "Marina", "family_name": "Meila", "institution": "University of Washington"}]}