{"title": "Spectral Clustering of graphs with the Bethe Hessian", "book": "Advances in Neural Information Processing Systems", "page_first": 406, "page_last": 414, "abstract": "Spectral clustering is a standard approach to label nodes on a graph by studying the (largest or lowest) eigenvalues of a symmetric real matrix such as e.g. the adjacency or the Laplacian. Recently, it has been argued that using instead a more complicated, non-symmetric and higher dimensional operator, related to the non-backtracking walk on the graph, leads to improved performance in detecting clusters, and even to optimal performance for the stochastic block model. Here, we propose to use instead a simpler object, a symmetric real matrix known as the Bethe Hessian operator, or deformed Laplacian. We show that this approach combines the performances of the non-backtracking operator, thus detecting clusters all the way down to the theoretical limit in the stochastic block model, with the computational, theoretical and memory advantages of real symmetric matrices.", "full_text": "Spectral Clustering of Graphs with the Bethe Hessian\n\nAlaa Saade\n\nLaboratoire de Physique Statistique, CNRS UMR 8550\n\u00b4Ecole Normale Superieure, 24 Rue Lhomond Paris 75005\n\nFlorent Krzakala\u2217\n\nSorbonne Universit\u00b4es, UPMC Univ Paris 06\n\nLaboratoire de Physique Statistique, CNRS UMR 8550\n\n\u00b4Ecole Normale Superieure, 24 Rue Lhomond\n\nParis 75005\n\nLenka Zdeborov\u00b4a\n\nInstitut de Physique Th\u00b4eorique\n\nCEA Saclay and CNRS URA 2306\n\n91191 Gif-sur-Yvette, France\n\nAbstract\n\nSpectral clustering is a standard approach to label nodes on a graph by study-\ning the (largest or lowest) eigenvalues of a symmetric real matrix such as e.g.\nthe adjacency or the Laplacian. Recently, it has been argued that using instead a\nmore complicated, non-symmetric and higher dimensional operator, related to the\nnon-backtracking walk on the graph, leads to improved performance in detecting\nclusters, and even to optimal performance for the stochastic block model. Here,\nwe propose to use instead a simpler object, a symmetric real matrix known as the\nBethe Hessian operator, or deformed Laplacian. We show that this approach com-\nbines the performances of the non-backtracking operator, thus detecting clusters\nall the way down to the theoretical limit in the stochastic block model, with the\ncomputational, theoretical and memory advantages of real symmetric matrices.\n\nClustering a graph into groups or functional modules (sometimes called communities) is a central\ntask in many \ufb01elds ranging from machine learning to biology. A common benchmark for this prob-\nlem is to consider graphs generated by the stochastic block model (SBM) [7, 22]. In this case, one\nconsiders n vertices and each of them has a group label gv \u2208 {1, . . . , q}. A graph is then created\nas follows: all edges are generated independently according to a q \u00d7 q matrix p of probabilities,\nwith Pr[Au,v = 1] = pgu,gv. The group labels are hidden, and the task is to infer them from the\nknowledge of the graph. The stochastic block model generates graphs that are a generalization of\nthe Erd\u02ddos-R\u00b4enyi ensemble where an unknown labeling has been hidden.\nWe concentrate on the sparse case, where algorithmic challenges appear. In this case pab is O(1/n),\nand we denote pab = cab/n. For simplicity we concentrate on the most commonly-studied case\nwhere groups are equally sized, cab = cin if a = b and cab = cout if a (cid:54)= b. Fixing cin > cout\nis referred to as the assortative case, because vertices from the same group connect with higher\nprobability than with vertices from other groups. cout > cin is called the disassortative case. An\nimportant conjecture [4] is that any tractable algorithm will only detect communities if\n\n(1)\nwhere c is the average degree. In the case of q = 2 groups, in particular, this has been rigorously\nproven [15, 12] (in this case, one can also prove that no algorithm could detect communities if this\ncondition is not met). An ideal clustering algorithm should have a low computational complexity\nwhile being able to perform optimally for the stochastic block model, detecting clusters down to the\ntransition (1).\n\nc ,\n\n\u2217This work has been supported in part by the ERC under the European Union\u2019s 7th Framework Programme\n\nGrant Agreement 307087-SPARCS\n\n1\n\n|cin \u2212 cout| > q\n\n\u221a\n\n\fSo far there are two algorithms in the literature able to detect clusters down to the transition (1). One\nis a message-passing algorithm based on belief-propagation [5, 4]. This algorithm, however, needs\nto be fed with the correct parameters of the stochastic block model to perform well, and its compu-\ntational complexity scales quadratically with the number of clusters, which is an important practical\nlimitation. To avoid such problems, the most popular non-parametric approaches to clustering are\nspectral methods, where one classi\ufb01es vertices according to the eigenvectors of a matrix associated\nwith the network, for instance its adjacency matrix [11, 16]. However, while this works remarkably\nwell on regular, or dense enough graphs [2], the standard versions of spectral clustering are subop-\ntimal on graphs generated by the SBM, and in some cases completely fail to detect communities\neven when other (more complex) algorithms such as belief propagation can do so. Recently, a new\nclass of spectral algorithms based on the use of a non-backtracking walk on the directed edges of the\ngraph has been introduced [9] and argued to be better suited for spectral clustering. In particular, it\nhas been shown to be optimal for graphs generated by the stochastic block model, and able to detect\ncommunities even in the sparse case all the way down to the theoretical limit (1).\nThese results are, however, not entirely satisfactory. First, the use a of a high-dimensional matrix\n(of dimension 2m - where m is the number of edges - rather than n, the number of nodes) can be\nexpensive, both in terms of computational time and memory. Secondly, linear algebra methods are\nfaster and more ef\ufb01cient for symmetric matrices than non-symmetric ones. The \ufb01rst problem was\npartially resolved in [9] where an equivalent operator of dimensions 2n was shown to exist. It was\nstill, however, a non-symmetric one and more importantly, the reduction does not extend to weighted\ngraphs, and thus presents a strong limitation.\nIn this contribution, we provide the best of both worlds: a non-parametric spectral algorithm for clus-\ntering with a symmetric n \u00d7 n, real operator that performs as well as the non-backtracking operator\nof [9], in the sense that it identi\ufb01es communities as soon as (1) holds. We show numerically that our\napproach performs as well as the belief-propagation algorithm, without needing prior knowledge of\nany parameter, making it the simplest algorithmically among the best-performing clustering meth-\nods. This operator is actually not new, and has been known as the Bethe Hessian in the context of\nstatistical physics and machine learning [14, 17] or the deformed Laplacian in other \ufb01elds. However,\nto the best of our knowledge, it has never been considered in the context of spectral clustering.\nThe paper is organized as follows. In Sec. 1 we give the expression of the Bethe Hessian operator.\nWe discuss in detail its properties and its connection with both the non-backtracking operator and an\nIsing spin glass in Sec. 2. In Sec. 3, we study analytically the spectrum in the case of the stochastic\nblock model. Finally, in Sec. 4 we perform numerical tests on both the stochastic block model and\non some real networks.\n\n1 Clustering based on the Bethe Hessian matrix\nLet G = (V, E) be a graph with n vertices, V = {1, ..., n}, and m edges. Denote by A its adjacency\nmatrix, and by D the diagonal matrix de\ufb01ned by Dii = di, \u2200i \u2208 V , where di is the degree of\nvertex i. We then de\ufb01ne the Bethe Hessian matrix, sometimes called the deformed Laplacian, as\n\nH(r) := (r2 \u2212 1)1 \u2212 rA + D ,\n\n(2)\nwhere |r| > 1 is a regularizer that we will set to a well-de\ufb01ned value |r| = rc depending on the\n\u221a\ngraph, for instance rc =\nc in the case of the stochastic block model, where c is the average degree\nof the graph (see Sec. 2.1).\nThe spectral algorithm that is the main result of this paper works as follows: we compute the eigen-\nvectors associated with the negative eigenvalues of both H(rc) and H(\u2212rc), and cluster them with\na standard clustering algorithm such as k-means (or simply by looking at the sign of the components\nin the case of two communities). The negative eigenvalues of H(rc) reveal the assortative aspects,\nwhile those of H(\u2212rc) reveal the disassortative ones.\nstochastic block model. When r =\u00b1\u221a\nFigure 1 illustrates the spectral properties of the Bethe Hessian (2) for networks generated by the\nc the informative eigenvalues (i.e. those having eigenvectors\ncorrelated to the cluster structure) are the negative ones, while the non-informative bulk remains\npositive. There are as many negative eigenvalues as there are hidden clusters. It is thus straight-\nforward to select the relevant eigenvectors. This is very unlike the situation for the operators used\nin standard spectral clustering algorithms (except, again, for the non-backtracking operator) where\n\n2\n\n\fFigure 1: Spectral density of the Bethe Hessian for various values of the regularizer r on the stochas-\ntic block model. The red dots are the result of the direct diagonalization of the Bethe Hessian for a\ngraph of 104 vertices with 2 clusters, with c = 4, cin = 7, cout = 1. The black curves are the solutions\nto the recursion (15) for c = 4, obtained from population dynamics (with a population of size 105),\nsee section 3. We isolated the two smallest eigenvalues, represented as small bars for convenience.\nThe dashed black line marks the x = 0 axis, and the inset is a zoom around this axis. At large value of\nr (top left) r = 5, the Bethe Hessian is positive de\ufb01nite and all eigenvalues are positive. As r decays,\nthe spectrum moves towards the x = 0 axis. The smallest (non-informative) eigenvalue reaches zero\nfor r = c = 4 (middle top), followed, as r decays further, by the second (informative) eigenvalue at\nr = (cin \u2212 cout)/2 = 3, which is the value of the second largest eigenvalue of B in this case [9] (top\nc = 2 (bottom left). At this point, the information is in the\nright). Finally, the bulk reaches 0 at rc =\nnegative part, while the bulk is in the positive part. Interestingly, if r decays further (bottom middle\nand right) the bulk of the spectrum remains positive, but the informative eigenvalues blend back into\nthe bulk. The best choice is thus to work at rc =\n\nc = 2.\n\n\u221a\n\n\u221a\n\none must decide in a somehow ambiguous way which eigenvalues are relevant (outside the bulk) or\nnot (inside the bulk). Here, on the contrary, no prior knowledge of the number of communities is\nneeded.\n\nOn more general graphs, we argue that the best choice for the regularizer is rc = (cid:112)\u03c1(B), where\n\n\u03c1(B) is the spectral radius of the non-backtracking operator. We support this claim both numerically,\non real world networks (sec. 4.2), and analytically (sec. 3). We also show that \u03c1(B) can be computed\nwithout building the matrix B itself, by ef\ufb01ciently solving a quadratic eigenproblem (sec. 2.1).\nThe Bethe Hessian can be generalized straightforwardly to the weighed case: if the edge (i, j) carries\na weight wij, then we can use the matrix \u02dcH(r) de\ufb01ned by\n\n\u02dcH(r)ij = \u03b4ij\n\n1 +\n\n,\n\n(3)\n\n(cid:16)\n\n(cid:88)\n\nk\u2208\u2202i\n\n(cid:17) \u2212 rwijAij\n\nr2 \u2212 w2\n\nij\n\nw2\nik\n\nr2 \u2212 w2\n\nik\n\nwhere \u2202i denotes the set of neighbors of vertex i. This is in fact the general expression of the Bethe\nHessian of a certain weighted statistical model (see section 2.2). If all weights are equal to unity, \u02dcH\nreduces to (2) up to a trivial factor. Most of the arguments developed in the following generalize im-\nmediately to \u02dcH, including the relationship with the weighted non-backtracking operator, introduced\nin the conclusion of [9].\n\n2 Derivation and relation to previous works\n\nOur approach is connected to both the spectral algorithm using the non-backtracking matrix and\nto an Ising spin glass model. We now discuss these connections, and the properties of the Bethe\nHessian operator along the way.\n\n3\n\n020406000.050.10.150.2\u03bd(\u03bb)r=5020406000.050.10.150.2r=4020406000.050.10.150.2r=3020406000.050.10.150.2\u03bb\u03bd(\u03bb)r=2020406000.050.10.150.2\u03bbr=1.5020406000.050.10.150.2\u03bbr=1.1020406000.050.10.150.2\u03bd(\u03bb)r=5020406000.050.10.150.2r=4020406000.050.10.150.2r=3020406000.050.10.150.2\u03bb\u03bd(\u03bb)r=2020406000.050.10.150.2\u03bbr=1.5020406000.050.10.150.2\u03bbr=1.1020406000.050.10.150.2\u03bd(\u03bb)r=5020406000.050.10.150.2r=4020406000.050.10.150.2r=3020406000.050.10.150.2\u03bb\u03bd(\u03bb)r=2020406000.050.10.150.2\u03bbr=1.5020406000.050.10.150.2\u03bbr=1.1020406000.050.10.150.2\u03bd(\u03bb)r=5020406000.050.10.150.2r=4020406000.050.10.150.2r=3020406000.050.10.150.2\u03bb\u03bd(\u03bb)r=2020406000.050.10.150.2\u03bbr=1.5020406000.050.10.150.2\u03bbr=1.1020406000.050.10.150.2\u03bd(\u03bb)r=5020406000.050.10.150.2r=4020406000.050.10.150.2r=3020406000.050.10.150.2\u03bb\u03bd(\u03bb)r=2020406000.050.10.150.2\u03bbr=1.5020406000.050.10.150.2\u03bbr=1.1020406000.050.10.150.2\u03bd(\u03bb)r=5020406000.050.10.150.2r=4020406000.050.10.150.2r=3020406000.050.10.150.2\u03bb\u03bd(\u03bb)r=2020406000.050.10.150.2\u03bbr=1.5020406000.050.10.150.2\u03bbr=1.1020406000.050.10.150.2\u03bd(\u03bb)r=5020406000.050.10.150.2r=4020406000.050.10.150.2r=3020406000.050.10.150.2\u03bb\u03bd(\u03bb)r=2020406000.050.10.150.2\u03bbr=1.5020406000.050.10.150.2\u03bbr=1.1020406000.050.10.150.2\u03bd(\u03bb)r=5020406000.050.10.150.2r=4020406000.050.10.150.2r=3020406000.050.10.150.2\u03bb\u03bd(\u03bb)r=2020406000.050.10.150.2\u03bbr=1.5020406000.050.10.150.2\u03bbr=1.1020406000.050.10.150.2\u03bd(\u03bb)r=5020406000.050.10.150.2r=4020406000.050.10.150.2r=3020406000.050.10.150.2\u03bb\u03bd(\u03bb)r=2020406000.050.10.150.2\u03bbr=1.5020406000.050.10.150.2\u03bbr=1.1000000020406000.050.10.150.2\u03bd(\u03bb)r=5020406000.050.10.150.2r=4020406000.050.10.150.2r=3020406000.050.10.150.2\u03bb\u03bd(\u03bb)r=2020406000.050.10.150.2\u03bbr=1.5020406000.050.10.150.2\u03bbr=1.1020406000.050.10.150.2\u03bd(\u03bb)r=5020406000.050.10.150.2r=4020406000.050.10.150.2r=3020406000.050.10.150.2\u03bb\u03bd(\u03bb)r=2020406000.050.10.150.2\u03bbr=1.5020406000.050.10.150.2\u03bbr=1.1020406000.050.10.150.2\u03bd(\u03bb)r=5020406000.050.10.150.2r=4020406000.050.10.150.2r=3020406000.050.10.150.2\u03bb\u03bd(\u03bb)r=2020406000.050.10.150.2\u03bbr=1.5020406000.050.10.150.2\u03bbr=1.1020406000.050.10.150.2\u03bd(\u03bb)r=5020406000.050.10.150.2r=4020406000.050.10.150.2r=3020406000.050.10.150.2\u03bb\u03bd(\u03bb)r=2020406000.050.10.150.2\u03bbr=1.5020406000.050.10.150.2\u03bbr=1.1020406000.050.10.150.2\u03bd(\u03bb)r=5020406000.050.10.150.2r=4020406000.050.10.150.2r=3020406000.050.10.150.2\u03bb\u03bd(\u03bb)r=2020406000.050.10.150.2\u03bbr=1.5020406000.050.10.150.2\u03bbr=1.1020406000.050.10.150.2\u03bd(\u03bb)r=5020406000.050.10.150.2r=4020406000.050.10.150.2r=3020406000.050.10.150.2\u03bb\u03bd(\u03bb)r=2020406000.050.10.150.2\u03bbr=1.5020406000.050.10.150.2\u03bbr=1.1020202020202\f2.1 Relation with the non-backtracking matrix\nThe non-backtracking operator of [9] is de\ufb01ned as a 2m\u00d7 2m non-symmetric matrix indexed by the\ndirected edges of the graph i \u2192 j\n\nBi\u2192j,k\u2192l = \u03b4jk(1 \u2212 \u03b4il) .\n\n(4)\n\nThe remarkable ef\ufb01ciency of the non-backtracking operator is due to the particular structure of its\n(complex) spectrum. For graphs generated by the SBM the spectrum decomposes into a bulk of\n\nuninformative eigenvalues sharply constrained when n \u2192 \u221e to the disk of radius(cid:112)\u03c1(B), where\n(cid:112)\u03c1(B), while the presence of disassortative communities yields real negative eigenvalues smaller\nthan \u2212(cid:112)\u03c1(B). The authors of [9] showed that all eigenvalues \u03bb of B that are different from \u00b11 are\n\n\u03c1(B) is the spectral radius of B [20], well separated from the real, informative eigenvalues, that\nlie outside of this circle. It was also remarked that the number of real eigenvalues outside of the\ncircle is the number of communities, when the graph was generated by the stochastic block model.\nMore precisely, the presence of assortative communities yields real positive eigenvalues larger than\n\nroots of the polynomial\n\ndet [(\u03bb2 \u2212 1)1 \u2212 \u03bbA + D] = det H(\u03bb) .\n\n(5)\n\nThis is known in graph theory as the Ihara-Bass formula for the graph zeta function. It provides\nthe link between B and the (determinant of the) Bethe Hessian (already noticed in [23]): a real\neigenvalue of B corresponds to a value of r such that the Bethe Hessian has a vanishing eigenvalue.\nFor any \ufb01nite n, when r is large enough, H(r) is positive de\ufb01nite. Then as r decreases, a new\nnegative eigenvalue of H(r) appears when it crosses the zero axis, i.e whenever r is equal to a real\npositive eigenvalue \u03bb of B. The null space of H(\u03bb) is related to the corresponding eigenvector of B.\nDenoting (vi)1\u2264i\u2264n the eigenvector of H(\u03bb) with eigenvalue 0, and (vi\u2192j)(i,j)\u2208E the eigenvector\nof B with eigenvalue \u03bb, we have [9]:\n\nTherefore the vector (vi)1\u2264i\u2264n is correlated with the community structure when (vi\u2192j)(i,j)\u2208E is.\nThe numerical experiments of section 4 show that when r =\nc < \u03bb, the eigenvector (vi)1\u2264i\u2264n\ncorresponds to a strictly negative eigenvalue, and is even more correlated with the community struc-\nture than the eigenvector (vi\u2192j)(i,j)\u2208E. This fact still lacks a proper theoretical understanding. We\nprovide in section 2.2 a different, physical justi\ufb01cation to the relevance of the \u201cnegative\u201d eigenvec-\ntors of the Bethe Hessian for community detection. Of course, the same phenomenon takes place\nwhen increasing r from a large negative value. In order to translate all the informative eigenvalues\nof B into negative eigenvalues of H(r) we adopt\n\n\u221a\n\nsince all the relevant eigenvalues of B are outside the circle of radius rc. On the other hand, H(r =\n1) is the standard, positive-semide\ufb01nite, Laplacian so that for r < rc, the negative eigenvalues of\nH(r) move back into the positive part of the spectrum. This is consistent with the observation of\n[9] that the eigenvalues of B come in pairs having their product close to \u03c1(B), so that for each root\n\u03bb > rc of (5), corresponding to the appearance of a new negative eigenvalue, there is another root\n\u03bb(cid:48) (cid:39) \u03c1(B)/\u03bb < rc which we numerically found to correspond to the same eigenvalue becoming\npositive again.\nLet us stress that to compute \u03c1(B), we do not need to actually build the non-backtracking matrix.\nFirst, for large random networks of a given degree distribution, \u03c1(B) = (cid:104)d2(cid:105)/(cid:104)d(cid:105) \u2212 1 [9], where (cid:104)d(cid:105)\nand (cid:104)d2(cid:105) are the \ufb01rst and second moments of the degree distribution. In a more general setting, we\ncan ef\ufb01ciently re\ufb01ne this initial guess by solving for the closest root of the quadratic eigenproblem\nde\ufb01ned by (5), e.g. using a standard SLP algorithm [19]. With the choice (7), the informative\neigenvalues of B are in one-to-one correspondance with the union of negative eigenvalues of H(rc)\nand H(\u2212rc). Because B has as many informative eigenvalues as there are (detectable) communities\nin the network [9], their number will therefore tell us the number of (detectable) communities in the\ngraph, and we will use them to infer the community membership of the nodes, by using a standard\nclustering algorithm such as k-means.\n\n4\n\n(cid:88)\n\nk\u2208\u2202i\n\nvi =\n\nvk\u2192i .\n\nrc =(cid:112)\u03c1(B) .\n\n(6)\n\n(7)\n\n\f2.2 Hessian of the Bethe free energy\nLet us de\ufb01ne a pairwise Ising model on the graph G by the joint probability distribution:\n\n\uf8eb\uf8ed (cid:88)\n\n(i,j)\u2208E\n\n\uf8f6\uf8f8 ,\n\n(cid:16) 1\n\n(cid:17)\n\nr\n\natanh\n\nxixj\n\nP ({x}) =\n\n1\nZ\n\nexp\n\nwhere {x} := {xi}i\u2208{1..n} \u2208 {\u00b11}n is a set of binary random variables sitting on the nodes of the\ngraph G. The regularizer r is here a parameter that controls the strength of the interaction between\nthe variables: the larger |r| is, the weaker is the interaction.\nIn order to study this model, a standard approach in machine learning is the Bethe approximation\n[21] in which the means (cid:104)xi(cid:105) and moments (cid:104)xixj(cid:105) are approximated by the parameters mi and \u03beij\nthat minimize the so-called Bethe free energy FBethe({mi},{\u03beij}) de\ufb01ned as\n\n(cid:16) 1 + mixi + mjxj + \u03beijxixj\n\n(cid:17)\n\nFBethe({mi},{\u03beij}) = \u2212 (cid:88)\n(cid:88)\n\natanh\n\n(cid:88)\n\n(i,j)\u2208E\n(1 \u2212 di)\n\n\u03b7\n\nxi\n\n+\n\ni\u2208V\n\n(cid:88)\n\nxi,xj\n\n\u03b7\n\n(cid:16) 1\n(cid:17)\n(cid:16) 1 + mixi\n\n\u03beij +\n\nr\n\n(cid:88)\n(cid:17)\n\n(i,j)\u2208E\n\n,\n\n2\n\n4\n\n(9)\n\nwhere \u03b7(x) := x ln x. Such approach allows for instance to derive the belief propagation (BP)\nalgorithm. Here, however, we wish to restrict to a spectral one. At very high r the minimum of the\nr . It turns out [14]\nBethe free energy is given by the so-called paramagnetic point mi = 0, \u03beij = 1\nthat mi = 0, \u03beij = 1\nr is a stationarity point of the Bethe free energy for every r. Instead of consid-\nering the complete Bethe free energy, we will consider only its behavior around the paramagnetic\npoint. This can be expressed via the Hessian (matrix of second derivatives), that has been studied\nextensively, see e.g. [14], [17]. At the paramagnetic point, the blocks of the Hessian involving one\nderivative with respect to the \u03beij are 0, and the block involving two such derivatives is a positive\nde\ufb01nite diagonal matrix [23]. We will therefore, somewhat improperly, call Hessian the matrix\n\nHij(r) =\n\n\u2202FBethe\n\u2202mi\u2202mj\n\n(cid:12)(cid:12)(cid:12)mi=0,\u03beij = 1\n\nr\n\n.\n\n(8)\n\n(10)\n\n(11)\n\nIn particular, at the paramagnetic point:\nH(r) = 1 +\n\nD\n\nr2 \u2212 1\n\n\u2212 rA\nr2 \u2212 1\n\n=\n\nH(r)\nr2 \u2212 1\n\n.\n\nA more general expression of the Bethe Hessian in the case of weighted interactions atanh(wij/r)\n(with weights rescaled to be in [0, 1]) is given by eq. (3). All eigenvectors of H(r) and H(r) are the\nsame, as are the eigenvalues up to a multiplicative, positive factor (since we consider only |r| > 1).\nThe paramagnetic point is stable iff H(r) is positive de\ufb01nite. The appearance of each negative\neigenvalue of the Hessian corresponds to a phase transition in the Ising model at which a new cluster\n(or a set of clusters) starts to be identi\ufb01able. The corresponding eigenvector will give the direction\ntowards the cluster labeling. This motivates the use of the Bethe Hessian for spectral clustering.\nFor tree-like graphs such as those generated by the SBM, model (8) can been studied analytically\nin the asymptotic limit n\u2192\u221e. The location of the possible phase transitions in model (8) are also\nknown from spin glass theory and the theory of phase transitions on random graphs (see e.g. [14,\n5, 4, 17]). For positive r the trivial ferromagnetic phase appears at r = c, while the transitions\nc < r < c. For\ntowards the phases corresponding to the hidden community structure arise between\nc,\nthe model undergoes a spin glass phase transition. At this point all the relevant eigenvalues have\npassed in the negative side (all the possible transitions from the paramagnetic states to the hidden\nstructure have taken place) while the bulk of non-informative ones remains positive. This scenario\nis illustrated in Fig. 1 for the case of two assortative clusters.\n\ndisassortative communities, the situation is symmetric with r < \u2212\u221a\n\nc. Interestingly, at r = \u00b1\u221a\n\n\u221a\n\n3 The spectrum of the Bethe Hessian\n\nThe spectral density of the Bethe Hessian can be computed analytically on tree-like graphs such as\nthose generated by the stochastic block model. This will serve two goals: i) to justify independently\n\n5\n\n\four choice for the value of the regularizer r and ii) to show that for all values of r, the bulk of\nuninformative eigenvalues remains in the positive region. The spectral density is de\ufb01ned by:\n\n\u03bd(\u03bb) =\n\n1\nn\n\n\u03b4(\u03bb \u2212 \u03bbi) ,\n\n(12)\n\nwhere the \u03bbi\u2019s are the eigenvalues of the Bethe Hessian. It can be shown [18] that it is also given by\n\nn(cid:88)\n\ni=1\n\nn(cid:88)\n\ni=1\n\n(13)\n\n(14)\n\nwhere the \u2206i are complex variables living on the vertices of the graph G, which are given by:\n\n\u2206i =\n\n\u03bd(\u03bb) =\n\n1\n\u03c0n\n\nIm\u2206i(\u03bb) ,\n\n(cid:16) \u2212 \u03bb + r2 + di \u2212 1 \u2212 r2(cid:88)\n(cid:16) \u2212 \u03bb + r2 + di \u2212 1 \u2212 r2 (cid:88)\n\nl\u2208\u2202i\n\nl\u2208\u2202i\\j\n\n(cid:17)\u22121\n\n\u2206l\u2192i\n\n,\n\n(cid:17)\u22121\n\nwhere di is the degree of node i in the graph, and \u2202i is the set of neighbors of i. The \u2206i\u2192j are the\n(linearly stable) solution of the following belief propagation recursion, or cavity method [13],\n\n\u2206i\u2192j =\n\n\u2206l\u2192i\n\n.\n\n(15)\n\nThe ingredients to derive this formula are to turn the computation of the spectral density into a\nmarginalization problem for a graphical model on the graph G, and then write the belief propaga-\ntion equations to solve it. It can be shown [3] that this approach leads to an asymptotically exact\ndescription of the spectral density on random graphs such as those generated by the stochastic block\nmodel, which are locally tree-like in the limit where n \u2192 \u221e. We can solve equation (15) numer-\nically using a population dynamics algorithm [13]: starting from a pool of variables, we iterate by\ndrawing at each step a variable, its excess degree and its neighbors from the pool, and updating its\nvalue according to (15). The results are shown on Fig. 1: the bulk of the spectrum is always positive.\nWe now justify analytically that the bulk of eigenvalues of the Bethe Hessian reaches 0 at r =\n\n(cid:112)\u03c1(B). From (13) and (14), we see that if the linearly stable solution of (15) is real, then the\n\ncorresponding spectral density will be equal to 0. We want to show that there exists an open set\nU \u2282 R around 0 in which there exists a real, stable, solution to the BP recursion. Let us call\n\u2206 \u2208 R2m, where m is the number of edges in G, the vector which components are the \u2206i\u2192j. We\nintroduce the function F : (\u03bb, \u2206) \u2208 R2m+1 \u2192 F (\u03bb, \u2206) \u2208 R2m de\ufb01ned by\n\n(cid:16) \u2212 \u03bb + r2 + di \u2212 1 \u2212 r2 (cid:88)\n\n(cid:17) \u2212 1\n\n\u2206i\u2192j\n\n\u2206l\u2192i\n\nl\u2208\u2202i\\j\n\nF (\u03bb, \u2206)i\u2192j =\n\n,\n\n(16)\n\nso that equation (15) can be rewritten as\n\nF (\u03bb, \u2206) = 0 .\n\n(17)\n\nIt is straightforward to check that when \u03bb = 0, the assignment \u2206i\u2192j = 1/r2 is a real solution\nof (17). Furthermore, the Jacobian of F at this point reads\n\nJF (0,{1/r2}) =\n\n(18)\n\n\uf8eb\uf8ec\uf8ec\uf8ec\uf8ec\uf8ed\n\n\u22121\n0\n...\n\n0\n\nr2(r21 \u2212 B)\n\n\uf8f6\uf8f7\uf8f7\uf8f7\uf8f7\uf8f8 ,\n\ninvertible whenever r >(cid:112)\u03c1(B). From the continuous differentiability of F around (0,{1/r2}) and\n\nwhere B is the 2m\u00d72m non-backtracking operator and 1 is the 2m\u00d72m identity matrix. The square\nsubmatrix of the Jacobian containing the derivatives with respect to the messages \u2206i\u2192j is therefore\nthe implicit function theorem, there exists an open set V containing 0 such that for all \u03bb \u2208 V , there\nexists \u02dc\u2206(\u03bb) \u2208 R solution of (17) , and the function \u02dc\u2206 is continuous in \u03bb. To show that the spectral\n\n6\n\n\fdensity is indeed 0 in an open set around \u03bb = 0, we need to show that this solution is linearly stable.\nIntroducing the function G\u03bb : \u2206 \u2208 R2m \u2192 G\u03bb(\u2206) \u2208 R2m de\ufb01ned by\n\n(cid:16) \u2212 \u03bb + r2 + di \u2212 1 \u2212 r2 (cid:88)\n\nl\u2208\u2202i\\j\n\n(cid:17)\u22121\n\nG\u03bb(\u2206)i\u2192j =\n\n\u2206l\u2192i\n\n,\n\n(19)\n\nit is enough to show that the Jacobian of G\u03bb at the point \u02dc\u2206(\u03bb) has all its eigenvalues smaller than\n1 in modulus, for \u03bb close to 0. But since JG\u03bb(\u2206) is continuous in (\u03bb, \u2206) in the neighborhood of\n(0, \u02dc\u2206(0) = {1/r2}), and \u02dc\u2206(\u03bb) is continuous in \u03bb, it is enough to show that the spectral radius of\nJG0 ({1/r2}) is smaller than 1. We compute\n\nJG0({1/r2}) =\n\n1\nr2 B ,\n\n(20)\nso that the spectral radius of JG0 ({1/r2}) is \u03c1(B)/r2, which is (strictly) smaller than 1 as long as\nexists an open set U \u2282 V containing 0 such that \u2200\u03bb \u2208 U, the solution \u02dc\u2206 of the BP recursion (15)\nis real, so that the corresponding spectral density in U is equal to 0. This proves that the bulk of the\n\nr > (cid:112)\u03c1(B). From the continuity of the eigenvalues of a matrix with respect to its entries, there\nspectrum of H reaches 0 at r = rc =(cid:112)\u03c1(B), further justifying our choice for the regularizer.\n\n4 Numerical results\n\n4.1 Synthetic networks\n\nWe illustrate the ef\ufb01ciency of the algorithm for graphs generated by the stochastic block model.\nFig. 2 shows the performance of standard spectral clustering methods, as well as that of the belief\npropagation (BP) algorithm of [4], believed to be asymptotically optimal in large tree-like graph.\nThe performance is measured in terms of the overlap with the true labeling, de\ufb01ned as\n\n(cid:32)\n\n(cid:88)\n\nu\n\n1\nN\n\n(cid:33)(cid:30)(cid:18)\n\n(cid:19)\n\n\u03b4gu,\u02dcgu \u2212 1\nq\n\n1 \u2212 1\nq\n\n,\n\n(21)\n\nwhere gu is the true group label of node u, and \u02dcgu is the label given by the algorithm, and we maxi-\nmize over all q! possible permutation of the groups. The Bethe Hessian systematically outperforms\nB and does almost as well as BP, which is a more complicated algorithm, that we have run here\nassuming the knowledge of \u201doracle parameters\u201d: the number of communities, their sizes, and the\nmatrix pab [5, 4]. The Bethe Hessian, on the other hand is non-parametric and infers the number of\ncommunities in the graph by counting the number of negative eigenvalues.\n\n4.2 Real networks\n\nWe \ufb01nally turn towards actual real graphs to illustrate the performances of our approach, and to\nshow that even if real networks are not generated by the stochastic block model, the Bethe Hessian\noperator remains a useful tool.\nIn Table 1 we give the overlap and the number of groups to be\nidenti\ufb01ed. We limited our experiments to this list of networks because they have known, \u201cground\ntrue\u201d clusters. For each case we observed a large correlation to the ground truth, and at least equal\n(and sometimes better) performances with respect to the non backtracking operator. The overlap\nwas computed assuming knowledge of the number of ground true clusters. The number of clusters is\ncorrectly given by the number of negative eigenvalues of the Bethe Hessian in all the presented cases\nexcept for the political blogs network (10 predicted clusters) and the football network (10 predicted\nclusters). These differences either question the statistical signi\ufb01cance of some of the human-decided\nlabelling, or suggest the existence of additional relevant clusters. It is also interesting to note that\nour approach works not only in the assortative case but also in the disassortative ones, for instance\nfor the word adjacency networks. A Matlab implementation to reproduce the results of the Bethe\nHessian for both real and synthetic networks is provided as supplementary material.\n\n5 Conclusion and perspectives\n\nWe have presented here a new approach to spectral clustering using the Bethe Hessian and given ev-\nidence that this approach combines the advantages of standard sparse symmetric real matrices, with\n\n7\n\n\fFigure 2: Performance of spectral clustering applied to graphs of size n = 105 generated from the\nthe stochastic block model. Each point is averaged over 20 such graphs. Left: assortative case with\nq = 2 clusters (theoretical transition at 3.46); middle: disassortative case with q = 2 (theoretical\ntransition at -3.46); right: assortative case with q = 3 clusters (theoretical transition at 5.20). For\nq = 2, we clustered according to the signs of the components of the eigenvector corresponding to\nthe second most negative eigenvalue of the Bethe Hessian operator. For q = 3, we used k-means on\nthe 3 \u201cnegative\u201d eigenvectors. While both the standard adjacency (A) and symmetrically normalized\nLaplacian (D\u22121/2(D\u2212A)D\u22121/2) approaches fail to identify clusters in a large relevant region, both\nthe non-backtracking (B) and the Bethe Hessian (BH) approaches identify clusters almost as well as\nusing the more complicated belief propagation (BP) with oracle parameters. Note, however, that the\nBethe Hessian systematically outperforms the non-backtracking operator, at a smaller computational\ncost. Additionally, clustering with the adjacency matrix and the normalized laplacian are run on the\nlargest connected component, while the Bethe Hessian doesn\u2019t require any kind of pre-processing\nof the graph. While our theory explains why clustering with the Bethe Hessian gives a positive\noverlap whenever clustering with B does, we currently don\u2019t have an explanation as to why the\nBethe Hessian overlap is actually larger.\n\nTable 1: Overlap for some commonly used benchmarks for community detection, computed using\nthe signs of the second eigenvector for the networks with two communities, and using k-means\nfor those with three and more communities, compared to the man-made group assignment. The\nnon-backtracking operator detects communities in all these networks, with an overlap comparable\nto the performance of other spectral methods. The Bethe Hessian systematically either equals or\noutperforms the results obtained by the non-backtracking operator.\n\nPART\n\nNon-backtracking [9] Bethe Hessian\n\nPolbooks (q = 3) [1]\nPolblogs (q = 2) [10]\nKarate (q = 2) [24]\nFootball (q = 12) [6]\nDolphins (q = 2) [16]\nAdjnoun (q = 2) [8]\n\n0.742857\n0.864157\n\n1\n\n0.924111\n0.741935\n0.625000\n\n0.757143\n0.865794\n\n1\n\n0.924111\n0.806452\n0.660714\n\nthe performances of the more involved non-backtracking operator, or the use of the belief propaga-\ntion algorithm with oracle parameters. Advantages over other spectral methods are that the number\nof negative eigenvalues provides an estimate of the number of clusters, there is a well-de\ufb01ned way\nto set the parameter r, making the algorithm tuning-parameter free, and it is guaranteed to detect the\ncommunities generated from the stochastic block model down to the theoretical limit. This answers\nthe quest for a tractable non-parametric approach that performs optimally in the stochastic block\nmodel. Given the large impact and the wide use of spectral clustering methods in many \ufb01elds of\nmodern science, we thus expect that our method will have a signi\ufb01cant impact on data analysis.\n\n8\n\n34500.20.40.60.81cin\u2212coutoverlapq=2BHBANorm.Lap.BP-5-4-300.20.40.60.81cin\u2212coutq=2BHBANorm.Lap.BP567800.20.40.60.81cin\u2212coutq=3BHBANorm.Lap.BP\fReferences\n[1] L. A Adamic and N. Glance. The political blogosphere and the 2004 us election: divided they\nblog. In Proceedings of the 3rd international workshop on Link discovery, page 36. ACM,\n2005.\n\n[2] P. J Bickel and A. Chen. A nonparametric view of network models and newman\u2013girvan and\n\nother modularities. Proceedings of the National Academy of Sciences, 106(50):21068, 2009.\n\n[3] Charles Bordenave and Marc Lelarge. Resolvent of large random graphs. Random Structures\n\nand Algorithms, 37(3):332\u2013352, 2010.\n\n[4] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborov\u00b4a. Asymptotic analysis of the stochas-\ntic block model for modular networks and its algorithmic applications. Phys. Rev. E,\n84(6):066106, 2011.\n\n[5] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborov\u00b4a. Inference and phase transitions in the\n\ndetection of modules in sparse networks. Phys. Rev. Lett., 107(6):065701, 2011.\n\n[6] Michelle Girvan and Mark EJ Newman. Community structure in social and biological net-\n\nworks. Proceedings of the National Academy of Sciences, 99(12):7821\u20137826, 2002.\n\n[7] Paul W. Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels:\n\nFirst steps. Social Networks, 5(2):109, 1983.\n\n[8] Valdis Krebs. The network can be found on http://www.orgnet.com/.\n[9] F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborov\u00b4a, and P. Zhang. Spectral\nredemption in clustering sparse networks. Proceedings of the National Academy of Sciences,\n110(52):20935\u201320940, 2013.\n\n[10] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M Dawson. The bot-\ntlenose dolphin community of doubtful sound features a large proportion of long-lasting asso-\nciations. Behavioral Ecology and Sociobiology, 54(4):396\u2013405, 2003.\n\n[11] Ulrike Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395, 2007.\n[12] Laurent Massoulie. Community detection thresholds and the weak ramanujan property. arXiv\n\npreprint arXiv:1311.3085, 2013.\n\n[13] M. Mezard and A. Montanari.\n\nPress, 2009.\n\nInformation, Physics, and Computation. Oxford University\n\n[14] Joris M Mooij, Hilbert J Kappen, et al. Validity estimates for loopy belief propagation on\n\nbinary real-world networks. In NIPS, 2004.\n\n[15] Elchanan Mossel, Joe Neeman, and Allan Sly. A proof of the block model threshold conjecture.\n\narXiv preprint arXiv:1311.4115, 2013.\n\n[16] Mark EJ Newman. Finding community structure in networks using the eigenvectors of matri-\n\nces. Phys. Rev. E, 74(3):036104, 2006.\n\n[17] F. Ricci-Tersenghi. The bethe approximation for solving the inverse ising problem: a compar-\n\nison with other inference methods. J. Stat. Mech.: Th. and Exp., page P08015, 2012.\n\n[18] Tim Rogers, Isaac P\u00b4erez Castillo, Reimer K\u00a8uhn, and Koujin Takeda. Cavity approach to the\n\nspectral density of sparse symmetric random matrices. Phys. Rev. E, 78(3):031116, 2008.\n\n[19] Axel Ruhe. Algorithms for the nonlinear eigenvalue problem. SIAM Journal on Numerical\n\nAnalysis, 10(4):674\u2013689, 1973.\n\n[20] Alaa Saade, Florent Krzakala, and Lenka Zdeborov\u00b4a. Spectral density of the non-backtracking\n\noperator on random graphs. EPL, 107(5):50005, 2014.\n\n[21] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational\n\ninference. Foundations and Trends in Machine Learning, 1, 2008.\n\n[22] Yuchung J Wang and George Y Wong. Stochastic blockmodels for directed graphs. Journal of\n\nthe American Statistical Association, 82(397):8\u201319, 1987.\n\n[23] Yusuke Watanabe and Kenji Fukumizu. Graph zeta function in the bethe free energy and loopy\n\nbelief propagation. In NIPS, pages 2017\u20132025, 2009.\n\n[24] W Zachary. An information \ufb02ow model for con\ufb02ict and \ufb01ssion in small groups1. Journal of\n\nanthropological research, 33(4):452\u2013473, 1977.\n\n9\n\n\f", "award": [], "sourceid": 275, "authors": [{"given_name": "Alaa", "family_name": "Saade", "institution": "ENS"}, {"given_name": "Florent", "family_name": "Krzakala", "institution": "Ecole Normale Superieure"}, {"given_name": "Lenka", "family_name": "Zdeborov\u00e1", "institution": "CEA Saclay and CNRS URA 2306"}]}