{"title": "Optimal Cluster Recovery in the Labeled Stochastic Block Model", "book": "Advances in Neural Information Processing Systems", "page_first": 965, "page_last": 973, "abstract": "We consider the problem of community detection or clustering in the labeled Stochastic Block Model (LSBM) with a finite number $K$ of clusters of sizes linearly growing with the global population of items $n$. Every pair of items is labeled independently at random, and label $\\ell$ appears with probability $p(i,j,\\ell)$ between two items in clusters indexed by $i$ and $j$, respectively. The objective is to reconstruct the clusters from the observation of these random labels. Clustering under the SBM and their extensions has attracted much attention recently. Most existing work aimed at characterizing the set of parameters such that it is possible to infer clusters either positively correlated with the true clusters, or with a vanishing proportion of misclassified items, or exactly matching the true clusters. We find the set of parameters such that there exists a clustering algorithm with at most $s$ misclassified items in average under the general LSBM and for any $s=o(n)$, which solves one open problem raised in \\cite{abbe2015community}. We further develop an algorithm, based on simple spectral methods, that achieves this fundamental performance limit within $O(n \\mbox{polylog}(n))$ computations and without the a-priori knowledge of the model parameters.", "full_text": "Optimal Cluster Recovery\n\nin the Labeled Stochastic Block Model\n\nSe-Young Yun\n\nCNLS, Los Alamos National Lab.\n\nLos Alamos, NM 87545\n\nsyun@lanl.gov\n\nAlexandre Proutiere\n\nAutomatic Control Dept., KTH\n\nStockholm 100-44, Sweden\n\nalepro@kth.se\n\nAbstract\n\nWe consider the problem of community detection or clustering in the labeled\nStochastic Block Model (LSBM) with a \ufb01nite number K of clusters of sizes\nlinearly growing with the global population of items n. Every pair of items is\nlabeled independently at random, and label (cid:96) appears with probability p(i, j, (cid:96))\nbetween two items in clusters indexed by i and j, respectively. The objective is to\nreconstruct the clusters from the observation of these random labels.\nClustering under the SBM and their extensions has attracted much attention recently.\nMost existing work aimed at characterizing the set of parameters such that it is\npossible to infer clusters either positively correlated with the true clusters, or with\na vanishing proportion of misclassi\ufb01ed items, or exactly matching the true clusters.\nWe \ufb01nd the set of parameters such that there exists a clustering algorithm with\nat most s misclassi\ufb01ed items in average under the general LSBM and for any\ns = o(n), which solves one open problem raised in [2]. We further develop\nan algorithm, based on simple spectral methods, that achieves this fundamental\nperformance limit within O(npolylog(n)) computations and without the a-priori\nknowledge of the model parameters.\n\n1\n\nIntroduction\n\nCommunity detection consists in extracting (a few) groups of similar items from a large global\npopulation, and has applications in a wide spectrum of disciplines including social sciences, biology,\ncomputer science, and statistical physics. The communities or clusters of items are inferred from the\nobserved pair-wise similarities between items, which, most often, are represented by a graph whose\nvertices are items and edges are pairs of items known to share similar features.\nThe stochastic block model (SBM), introduced three decades ago in [12], constitutes a natural\nperformance benchmark for community detection, and has been, since then, widely studied. In the\nSBM, the set of items V = {1, . . . , n} are partitioned into K non-overlapping clusters V1, . . . ,VK,\nthat have to be recovered from an observed realization of a random graph. In the latter, an edge\nbetween two items belonging to clusters Vi and Vj, respectively, is present with probability p(i, j),\nindependently of other edges. The analyses presented in this paper apply to the SBM, but also to the\nlabeled stochastic block model (LSBM) [11], a more general model to describe the similarities of\nitems. There, the observation of the similarity between two items comes in the form of a label taken\nfrom a \ufb01nite set L = {0, 1, . . . , L}, and label (cid:96) is observed between two items in clusters Vi and Vj,\nrespectively, with probability p(i, j, (cid:96)), independently of other labels. The standard SBM can be seen\nas a particular instance of its labeled counterpart with two possible labels 0 and 1, and where the\nedges present (resp. absent) in the SBM correspond to item pairs with label 1 (resp. 0). The problem\nof cluster recovery under the LSBM consists in inferring the hidden partition V1, . . . ,VK from the\nobservation of the random labels on each pair of items.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fOver the last few years, we have seen remarkable progresses for the problem of cluster recovery under\nthe SBM (see [7] for an exhaustive literature review), highlighting its scienti\ufb01c relevance and richness.\nMost recent work on the SBM aimed at characterizing the set of parameters (i.e., the probabilities\np(i, j) that there exists an edge between nodes in clusters i and j for 1 \u2264 i, j \u2264 K) such that some\nqualitative recovery objectives can or cannot be met. For sparse scenarios where the average degree\nof items in the graph is O(1), parameters under which it is possible to extract clusters positively\ncorrelated with the true clusters have been identi\ufb01ed [5, 18, 16]. When the average degree of the\ngraph is \u03c9(1), one may predict the set of parameters allowing a cluster recovery with a vanishing (as\nn grows large) proportion of misclassi\ufb01ed items [22, 17], but one may also characterize parameters\nfor which an asymptotically exact cluster reconstruction can be achieved [1, 21, 8, 17, 2, 3, 13].\nIn this paper, we address the \ufb01ner and more challenging question of determining, under the general\nLSBM, the minimal number of misclassi\ufb01ed items given the parameters of the model. Speci\ufb01cally,\nfor any given s = o(n), our goal is to identify the set of parameters such that it is possible to devise a\nclustering algorithm with at most s misclassi\ufb01ed items. Of course, if we achieve this goal, we shall\nrecover all the aforementioned results on the SBM.\n\nMain results. We focus on the labeled SBM as described above, and where each item is assigned\nto cluster Vk with probability \u03b1k > 0, independently of other items. We assume w.l.o.g.\nthat\n\u03b11 \u2264 \u03b12 \u2264 \u00b7\u00b7\u00b7 \u2264 \u03b1K. We further assume that \u03b1 = (\u03b11, . . . , \u03b1K) does not depend on the total\npopulation of items n. Conditionally on the assignment of items to clusters, the pair or edge\n(v, w) \u2208 V 2 has label (cid:96) \u2208 L = {0, 1, . . . , L} with probability p(i, j, (cid:96)), when v \u2208 Vi and w \u2208 Vj.\nW.l.o.g., 0 is the most frequent label, i.e., 0 = arg max(cid:96)\nj=1 \u03b1i\u03b1jp(i, j, (cid:96)). Throughout the\npaper, we typically assume that \u00afp = o(1) and \u00afpn = \u03c9(1) where \u00afp = maxi,j,(cid:96)\u22651 p(i, j, (cid:96)) denotes the\nmaximum probability of observing a label different than 0. We shall explicitly state whether these\nassumption are made when deriving our results. In the standard SBM, the second assumption means\nthat the average degree of the corresponding random graph is \u03c9(1). This also means that we can hope\nto recover clusters with a vanishing proportion of misclassi\ufb01ed items. We \ufb01nally make the following\nassumption: there exist positive constants \u03b7 and \u03b5 such that for every i, j, k \u2208 [K] = {1, . . . , K},\n\n(cid:80)K\n\n(cid:80)K\n\ni=1\n\n(cid:80)K\n\nk=1\n\n(cid:80)L\n(cid:96)=1(p(i, k, (cid:96)) \u2212 p(j, k, (cid:96)))2\n\n\u2265 \u03b5.\n\n\u00afp2\n\n(A1) \u2200(cid:96) \u2208 L,\n\np(i, j, (cid:96))\np(i, k, (cid:96))\n\n\u2264 \u03b7\n\nand\n\n(A2)\n\n(A2) imposes a certain separation between the clusters. For example, in the standard SBM with two\ncommunities, p(1, 1, 1) = p(2, 2, 1) = \u03be, and p(1, 2, 1) = \u03b6, (A2) is equivalent to 2(\u03be \u2212 \u03b6)2/\u03be2 \u2265 \u0001.\nIn summary, the LSBM is parametrized by \u03b1 and p = (p(i, j, (cid:96)))1\u2264i,j\u2264K,0\u2264(cid:96)\u2264L, and recall that \u03b1\ndoes not depend on n, whereas p does.\nFor the above LSBM, we derive, for any arbitrary s = o(n), a necessary condition under which\nthere exists an algorithm inferring clusters with s misclassi\ufb01ed items. We further establish that\nunder this condition, a simple extension of spectral algorithms extract communities with less than s\nmisclassi\ufb01ed items. To formalize these results, we introduce the divergence of (\u03b1, p). We denote by\np(i) the K \u00d7 (L + 1) matrix whose element on the j-th row and the ((cid:96) + 1)-th column is p(i, j, (cid:96))\nand denote by p(i, j) \u2208 [0, 1]L+1 the vector describing the probability distribution of the label of a\npair of items in Vi and Vj, respectively. Let P K\u00d7(L+1) denote the set of K \u00d7 (L + 1) matrices such\nthat each row represents a probability distribution. The divergence D(\u03b1, p) of (\u03b1, p) is de\ufb01ned as\nfollows: D(\u03b1, p) = mini,j:i(cid:54)=j DL+(\u03b1, p(i), p(j)) with\n\nDL+(\u03b1, p(i), p(j)) =\n\nKL(y(k), p(i, k)) =(cid:80)L\n\nmin\n\ny\u2208P K\u00d7(L+1)\n\nmax\n\n\u03b1kKL(y(k), p(i, k)),\n\n\u03b1kKL(y(k), p(j, k))\n\nk=1\n\nk=1\n\nwhere KL denotes the Kullback-Leibler divergence between two label distributions, i.e.,\n\np(i,k,(cid:96)). Finally, we denote by \u03b5\u03c0(n) the number of misclas-\nsi\ufb01ed items under the clustering algorithm \u03c0, and by E[\u03b5\u03c0(n)] its expectation (with respect to the\nrandomness in the LSBM and in the algorithm).\n\n(cid:96)=0 y(k, (cid:96)) log y(k,(cid:96))\n\nWe \ufb01rst derive a tight lower bound on the average number of misclassi\ufb01ed items when the latter is\no(n). Note that such a bound was unknown even for the SBM [2].\n\nTheorem 1 Assume that (A1) and (A2) hold, and that \u00afpn = \u03c9(1). Let s = o(n).\nIf there\nexists a clustering algorithm \u03c0 misclassifying in average less than s items asymptotically, i.e.,\n\n2\n\n(cid:40) K(cid:88)\n\nK(cid:88)\n\n(cid:41)\n\n\flim supn\u2192\u221e\n\nE[\u03b5\u03c0(n)]\n\ns\n\n\u2264 1, then the parameters (\u03b1, p) of the LSBM satisfy:\n\nlim inf\nn\u2192\u221e\n\nnD(\u03b1, p)\nlog(n/s)\n\n\u2265 1.\n\n(1)\n\nTo state the corresponding positive result (i.e., the existence of an algorithm misclassifying only\ns items), we make an additional assumption to avoid extremely sparse labels: (A3) there exists a\nconstant \u03ba > 0 such that np(j, i, (cid:96)) \u2265 (n\u00afp)\u03ba for all i, j and (cid:96) \u2265 1.\n\nTheorem 2 Assume that (A1), (A2), and (A3) hold, and that \u00afp = o(1), \u00afpn = \u03c9(1). Let s = o(n). If\nthe parameters (\u03b1, p) of the LSBM satisfy (1), then the Spectral Partition (SP ) algorithm presented\nin Section 4 misclassi\ufb01es at most s items with high probability, i.e., limn\u2192\u221e P[\u03b5SP (n) \u2264 s] = 1.\n\nThese theorems indicate that under the LSBM with parameters satisfying (A1) and (A2), the number\nof misclassi\ufb01ed items scales at least as n exp(\u2212nD(\u03b1, p)(1 + o(1)) under any clustering algorithm,\nirrespective of its complexity. They further establish that the Spectral Partition algorithm reaches this\nfundamental performance limit under the additional condition (A3). We note that the SP algorithm\nruns in polynomial time, i.e., it requires O(n2 \u00afp log(n)) \ufb02oating-point operations.\nWe further establish a necessary and suf\ufb01cient condition on the parameters of the LSBM for the\nexistence of a clustering algorithm recovering the clusters exactly with high probability. Deriving\nsuch a condition was also open [2].\n\nTheorem 3 Assume that (A1) and (A2) hold.\nIf there exists a clustering algorithm that does\nnot misclassify any item with high probability, then the parameters (\u03b1, p) of the LSBM satisfy:\nlog(n) \u2265 1. If this condition holds, then under (A3), the SP algorithm recovers the\nlim inf n\u2192\u221e nD(\u03b1,p)\nclusters exactly with high probability.\n\nThe paper is organized as follows. Section 2 presents the related work and example of application\nof our results. In Section 3, we sketch the proof of Theorem 1, which leverages change-of-measure\nand coupling arguments. We present in Section 4 the Spectral Partition algorithm, and analyze\nits performance (we outline the proof of Theorem 2). All results are proved in details in the\nsupplementary material.\n\n2 Related Work and Applications\n\n2.1 Related work\n\nCluster recovery in the SBM has attracted a lot of attention recently. We summarize below existing\nresults, and compare them to ours. Results are categorized depending on the targeted level of\nperformance. First, we consider the notion of detectability, the lowest level of performance requiring\nthat the extracted clusters are just positively correlated with the true clusters. Second, we look at\nasymptotically accurate recovery, stating that the proportion of misclassi\ufb01ed items vanishes as n\ngrows large. Third, we present existing results regarding exact cluster recovery, which means that no\nitem is misclassi\ufb01ed. Finally, we report recent work whose objective, like ours, is to characterize the\noptimal cluster recovery rate.\n\nDetectability. Necessary and suf\ufb01cient conditions for detectability have been studied for the binary\nsymmetric SBM (i.e., L = 1, K = 2, \u03b11 = \u03b12, p(1, 1, 1) = p(2, 2, 1) = \u03be, and p(1, 2, 1) =\np(2, 1, 1) = \u03b6). In the sparse regime where \u03be, \u03b6 = o(1), and for the binary symmetric SBM, the main\nfocus has been on identifying the phase transition threshold (a condition on \u03be and \u03b6) for detectability:\n\nIt was conjectured in [5] that if n(\u03be \u2212 \u03b6) <(cid:112)2n(\u03be + \u03b6) (i.e., under the threshold), no algorithm\n\ncan perform better than a simple random assignment of items to clusters, and above the threshold,\nclusters can partially be recovered. The conjecture was recently proved in [18] (necessary condition),\nand [16] (suf\ufb01cient condition). The problem of detectability has been also recently studied in [24]\nfor the asymmetric SBM with more than two clusters of possibly different sizes. Interestingly, it is\nshown that in most cases, the phase transition for detectability disappears.\n\n3\n\n\fThe present paper is not concerned with conditions for detectability. Indeed detectability means that\nonly a strictly positive proportion of items can be correctly classi\ufb01ed, whereas here, we impose that\nthe proportion of misclassi\ufb01ed items vanishes as n grows large.\n\nAsymptotically accurate recovery. A necessary and suf\ufb01cient condition for asymptotically accurate\nrecovery in the SBM (with any number of clusters of different but linearly increasing sizes) has been\nderived in [22] and [17]. Using our notion of divergence specialized to the SBM, this condition is\nnD(\u03b1, p) = \u03c9(1). Our results are more precise since the minimal achievable number of misclassi\ufb01ed\nitems is characterized, and apply to a broader setting since they are valid for the generic LSBM.\n\nAsymptotically exact recovery. Conditions for exact cluster recovery in the SBM have been also\nrecently studied. [1, 17, 8] provide a necessary and suf\ufb01cient condition for asymptotically exact\nrecovery in the binary symmetric SBM. For example, it is shown that when \u03be = a log(n)\nand\nab \u2265 1. In [2, 3],\n\u03b6 = b log(n)\nthe authors consider a more general SBM corresponding to our LSBM with L = 1. They de\ufb01ne\nCH-divergence as:\n\nfor a > b, clusters can be recovered exactly if and only if a+b\n\n2 \u2212 \u221a\n\nn\n\nn\n\nD+(\u03b1, p(i), p(j))\n\n=\n\nn\n\nlog(n)\n\nmax\n\u03bb\u2208[0,1]\n\nK(cid:88)\n\nk=1\n\n(cid:0)(1 \u2212 \u03bb)p(i, k, 1) + \u03bbp(j, k, 1) \u2212 p(i, k, 1)1\u2212\u03bbp(j, k, 1)\u03bb(cid:1) ,\n\n\u03b1k\n\nand show that mini(cid:54)=j D+(\u03b1, p(i), p(j)) > 1 is a necessary and suf\ufb01cient condition for asymptotically\nexact reconstruction. The following claim, proven in the supplementary material, relates D+ to DL+.\n\nClaim 4 When \u00afp = o(1), we have for all i, j:\n\nL(cid:88)\n\nDL+(\u03b1, p(i), p(j))\nn\u2192\u221e\u223c max\n\u03bb\u2208[0,1]\n\nK(cid:88)\n\n(cid:96)=1\n\nk=1\n\n(cid:0)(1 \u2212 \u03bb)p(i, k, (cid:96)) + \u03bbp(j, k, (cid:96)) \u2212 p(i, k, (cid:96))1\u2212\u03bbp(j, k, (cid:96))\u03bb(cid:1) .\n\n\u03b1k\n\nThus, the results in [2, 3] are obtained by applying Theorem 3 and Claim 4.\nIn [13], the authors consider a symmetric labeled SBM where communities are balanced (i.e.,\nK for all k) and where label probabilities are simply de\ufb01ned as p(i, i, (cid:96)) = p((cid:96)) for all i and\n\u03b1k = 1\np(i, j, (cid:96)) = q((cid:96)) for all i (cid:54)= j. It is shown that nI\nlog(n) > 1 is necessary and suf\ufb01cient for asymptotically\nexact recovery, where I = \u2212 2\nClaim 5 In the LSBM with K clusters, if \u00afp = o(1), and for all i, j, (cid:96) such that i (cid:54)= j, \u03b1i = 1\nK ,\np(i, i, (cid:96)) = p((cid:96)), and p(j, k, (cid:96)) = q((cid:96)), we have: D(\u03b1, p) n\u2192\u221e\u223c \u2212 2\n\n(cid:17)\n(cid:112)p((cid:96))q((cid:96))\n\n(cid:17)\n(cid:112)p((cid:96))q((cid:96))\n\n. We can relate I to D(\u03b1, p):\n\n(cid:16)(cid:80)L\n\n(cid:16)(cid:80)L\n\nK log\n\n(cid:96)=0\n\n.\n\nK log\n\n(cid:96)=0\n\nAgain from this claim, the results derived in [13] are obtained by applying Theorem 3 and Claim 5.\n\nOptimal recovery rate. In [6, 19], the authors consider the binary SBM in the sparse regime where\nthe average degree of items in the graph is O(1), and identify the minimal number of misclassi\ufb01ed\nitems for very speci\ufb01c intra- and inter-cluster edge probabilities \u03be and \u03b6. Again the sparse regime\nis out of the scope of the present paper. [23, 7] are concerned with the general SBM corresponding\nto our LSBM with L = 1, and with regimes where asympotically accurate recovery is possible.\nThe authors \ufb01rst characterize the optimal recovery rate in a minimax framework. More precisely,\nthey consider a (potentially large) set of possible parameters (\u03b1, p), and provide a lower bound on\nthe expected number of misclassi\ufb01ed items for the worst parameters in this set. Our lower bound\n(Theorem 1) is more precise as it is model-speci\ufb01c, i.e., we provide the minimal expected number\nof misclassi\ufb01ed items for a given parameter (\u03b1, p) (and for a more general class of models). Then\nthe authors propose a clustering algorithm, with time complexity O(n3 log(n)), and achieving their\nminimax recovery rate. In comparison, our algorithm yields an optimal recovery rate O(n2 \u00afp log(n))\nfor any given parameter (\u03b1, p), exhibits a lower running time, and applies to the generic LSBM.\n\n4\n\n\f2.2 Applications\n\nWe provide here a few examples of application of our results, illustrating their versatility. In all\nexamples, f (n) is a function such that f (n) = \u03c9(1), and a, b are \ufb01xed real numbers such that a > b.\n\nThe binary SBM. Consider the binary SBM where the average item degree is \u0398(f (n)), and repre-\nsented by a LSBM with parameters L = 1, K = 2, \u03b1 = (\u03b11, 1\u2212\u03b11), p(1, 1, 1) = p(2, 2, 1) = af (n)\nn ,\nand p(1, 2, 1) = p(2, 1, 1) = bf (n)\nn . From Theorems 1 and 2, the optimal number of misclassi\ufb01ed\nvertices scales as n exp(\u2212g(\u03b11, a, b)f (n)(1 + o(1))) when \u03b11 \u2264 1/2 (w.l.o.g.) and where\n(1\u2212 \u03b11 \u2212 \u03bb + 2\u03b11\u03bb)a + (\u03b11 + \u03bb\u2212 2\u03b1\u03bb)b\u2212 \u03b11a\u03bbb(1\u2212\u03bb) \u2212 (1\u2212 \u03b11)a(1\u2212\u03bb)b\u03bb.\ng(\u03b11, a, b) := max\n\u03bb\u2208[0,1]\na \u2212 \u221a\n\u221a\nIt can be easily checked that g(\u03b11, a, b) \u2265 g(1/2, a, b) = 1\n2). The worst\n2 (\ncase is hence obtained when the two clusters are of equal sizes. When f (n) = log(n), we also note\nthat the condition for asymptotically exact recovery is g(\u03b11, a, b) \u2265 1.\nRecovering a single hidden community. As in [9], consider a random graph model with a hidden\ncommunity consisting of \u03b1n vertices, edges between vertices belonging the hidden community are\npresent with probability af (n)\nn , and edges between other pairs are present with probability bf (n)\nn .\nThis is modeled by a LSBM with parameters K = 2, L = 1, \u03b11 = \u03b1, p(1, 1, 1) = af (n)\nn , and\np(1, 2, 1) = p(2, 1, 1) = p(2, 2, 1) = bf (n)\nn . The minimal number of misclassi\ufb01ed items when\nsearching for the hidden community scales as n exp(\u2212h(\u03b1, a, b)f (n)(1 + o(1))) where\n\nb)2 (letting \u03bb = 1\n\n(cid:19)\n\n.\n\n(cid:18)\n\nh(\u03b1, a, b) := \u03b1\n\na \u2212 (a \u2212 b)\n\n1 + log(a \u2212 b) \u2212 log(a log(a/b))\n\nlog(a/b)\n\nWhen f (n) = log(n), the condition for asymptotically exact recovery of the hidden community is\nh(\u03b1, a, b) \u2265 1.\nOptimal sampling for community detection under the SBM. Consider a dense binary symmetric\nSBM with intra- and inter-cluster edge probabilities a and b. In practice, to recover the clusters,\none might not be able to observe the entire random graph, but sample its vertex (here item) pairs as\nconsidered in [22]. Assume for instance that any pair of vertices is sampled with probability \u03b4f (n)\nfor some \ufb01xed \u03b4 > 0, independently of other pairs. We can model such scenario using a LSBM with\nthree labels, namely \u00d7, 0 and 1, corresponding to the absence of observation (the vertex pair is not\nsampled), the observation of the absence of an edge and of the presence of an edge, respectively,\nand with parameters for all i, j \u2208 {1, 2}, p(i, j,\u00d7) = 1 \u2212 \u03b4f (n)\nn , p(1, 1, 1) = p(2, 2, 1) = a \u03b4f (n)\nn ,\nand p(1, 2, 1) = p(2, 1, 1) = b \u03b4f (n)\nn exp(\u2212l(\u03b4, a, b)f (n)(1 + o(1))) where l := \u03b4(1\u2212\u221a\nn . The minimal number of misclassi\ufb01ed vertices scales as\nthe condition for asymptotically exact recovery is l(\u03b1, a+, a\u2212, b+, b\u2212) \u2265 1.\nSigned networks. Signed networks [15, 20] are used in social sciences to model positive and negative\ninteractions between individuals. These networks can be represented by a LSBM with three possible\nlabels, namely 0, + and -, corresponding to the absence of interaction, positive and negative interaction,\nrespectively. Consider such LSBM with parameters: K = 2, \u03b11 = \u03b12, p(1, 1, +) = p(2, 2, +) =\n, and p(1, 2,\u2212) =\na+f (n)\np(2, 1,\u2212) = b\u2212f (n)\n, for some \ufb01xed a+, a\u2212, b+, b\u2212 such that a+ > b+ and a\u2212 < b\u2212. The minimal\nnumber of misclassi\ufb01ed individuals here scales as n exp(\u2212m(\u03b1, a+, a\u2212, b+, b\u2212)f (n)(1 + o(1)))\nwhere\n\nab\u2212(cid:112)(1 \u2212 a)(1 \u2212 b)). When f (n) = log(n),\n\n, p(1, 1,\u2212) = p(2, 2,\u2212) = a\u2212f (n)\n\n, p(1, 2, +) = p(2, 1, +) = b+f (n)\n\nn\n\nn\n\na+ \u2212(cid:112)b+)2 + (\n\n\u221a\n\na\u2212 \u2212(cid:112)b\u2212)2(cid:17)\n\n.\n\nm(\u03b1, a+, a\u2212, b+, b\u2212) :=\n\nn\n\nn\n\nn\n\nWhen f (n) = log(n), the condition for asymptotically exact recovery is l(\u03b1, a+, a\u2212, b+, b\u2212) \u2265 1.\n\n(cid:16)\n\n\u221a\n(\n\n1\n2\n\n3 Fundamental Limits: Change of Measures through Coupling\n\nIn this section, we explain the construction of the proof of Theorem 1. The latter relies on an\nappropriate change-of-measure argument, frequently used to identify upper performance bounds in\n\n5\n\n\fonline stochastic optimization problems [14]. In the following, we refer to \u03a6, de\ufb01ned by parameters\n(\u03b1, p), as the true stochastic model under which all the observed random labels are generated, and\ndenote by P\u03a6 = P (resp. E\u03a6[\u00b7] = E[\u00b7]) the corresponding probability measure (resp. expectation). In\nour change-of-measure argument, we construct a second stochastic model \u03a8 (whose corresponding\nprobability measure and expectation are P\u03a8 and E\u03a8[\u00b7], respectively). Using a change of measures\nfrom P\u03a6 to P\u03a8, we relate the expected number of misclassi\ufb01ed items E\u03a6[\u03b5\u03c0(n)] under any clustering\nalgorithm \u03c0 to the expected (w.r.t. P\u03a8) log-likelihood ratio Q of the observed labels under P\u03a6 and\nP\u03a8. Speci\ufb01cally, we show that, roughly, log(n/E\u03a6[\u03b5\u03c0(n)]) must be smaller than E\u03a8[Q] for n large\nenough.\n\nk=1 \u03b1kKL(q(k), p(i(cid:63), k)) =(cid:80)K\n\nq \u2208 P K\u00d7(L+1) such that: D(\u03b1, p) =(cid:80)K\n\nConstruction of \u03c8. Let (i(cid:63), j(cid:63)) = arg mini,j:i