{"title": "On the Limitation of Spectral Methods: From the Gaussian Hidden Clique Problem to Rank-One Perturbations of Gaussian Tensors", "book": "Advances in Neural Information Processing Systems", "page_first": 217, "page_last": 225, "abstract": "We consider the following detection problem: given a realization of asymmetric matrix $X$ of dimension $n$, distinguish between the hypothesisthat all upper triangular variables are i.i.d. Gaussians variableswith mean 0 and variance $1$ and the hypothesis that there is aplanted principal submatrix $B$ of dimension $L$ for which all upper triangularvariables are i.i.d. Gaussians with mean $1$ and variance $1$, whereasall other upper triangular elements of $X$ not in $B$ are i.i.d.Gaussians variables with mean 0 and variance $1$. We refer to this asthe `Gaussian hidden clique problem'. When $L=( 1 + \\epsilon) \\sqrt{n}$ ($\\epsilon > 0$), it is possible to solve thisdetection problem with probability $1 - o_n(1)$ by computing thespectrum of $X$ and considering the largest eigenvalue of $X$.We prove that when$L < (1-\\epsilon)\\sqrt{n}$ no algorithm that examines only theeigenvalues of $X$can detect the existence of a hiddenGaussian clique, with error probability vanishing as $n \\to \\infty$.The result above is an immediate consequence of a more general result on rank-oneperturbations of $k$-dimensional Gaussian tensors.In this context we establish a lower bound on the criticalsignal-to-noise ratio below which a rank-one signal cannot be detected.", "full_text": "On the Limitation of Spectral Methods:\n\nFrom the Gaussian Hidden Clique Problem to\nRank-One Perturbations of Gaussian Tensors\n\nDepartment of Electrical Engineering and Department of Statistics. Stanford University.\n\nAndrea Montanari\n\nmontanari@stanford.edu\n\nDepartment of Cognitive and Brain Sciences, University of California, Berkeley, CA\n\ndaniel.reichman@gmail.com\n\nDaniel Reichman\n\nOfer Zeitouni\n\nFaculty of Mathematics, Weizmann Institute, Rehovot 76100, Israel\n\nand Courant Institute, New York University\nofer.zeitouni@weizmann.ac.il\n\nAbstract\n\nWe consider the following detection problem: given a realization of a symmetric\nmatrix X of dimension n, distinguish between the hypothesis that all upper tri-\nangular variables are i.i.d. Gaussians variables with mean 0 and variance 1 and\nthe hypothesis that there is a planted principal submatrix B of dimension L for\nwhich all upper triangular variables are i.i.d. Gaussians with mean 1 and variance\n1, whereas all other upper triangular elements of X not in B are i.i.d. Gaussians\n\u221a\nvariables with mean 0 and variance 1. We refer to this as the \u2018Gaussian hidden\nclique problem\u2019. When L = (1 + \u0001)\nn (\u0001 > 0), it is possible to solve this de-\ntection problem with probability 1 \u2212 on(1) by computing the spectrum of X and\n\u221a\nconsidering the largest eigenvalue of X. We prove that when L < (1 \u2212 \u0001)\nn no\nalgorithm that examines only the eigenvalues of X can detect the existence of a\nhidden Gaussian clique, with error probability vanishing as n \u2192 \u221e. The result\nabove is an immediate consequence of a more general result on rank-one pertur-\nbations of k-dimensional Gaussian tensors. In this context we establish a lower\nbound on the critical signal-to-noise ratio below which a rank-one signal cannot\nbe detected.\n\n1 Introduction\n\nsion n, such that the(cid:0)n\n\n2\n\nConsider the following detection problem. One is given a symmetric matrix X = X(n) of dimen-\n\n(cid:1) + n entries (Xi,j)i\u2264j are mutually independent random variables. Given\n\n(a realization of) X one would like to distinguish between the hypothesis that all random variables\nXi,j have the same distribution F0 to the hypothesis where there is a set U \u2286 [n], with L := |U|,\nso that all random variables in the submatrix XU := (Xs,t : s, t \u2208 U ) have a distribution F1 that is\ndifferent from the distribution of all other elements in X which are still distributed as F0. We refer\nto XU as the hidden submatrix.\n\n1\n\n\fThe same problem was recently studied in [1, 8] and, for the asymmetric case (where no symmetry\nassumption is imposed on the independent entries of X), in [6, 18, 20]. Detection problems with\nsimilar \ufb02avor (such as the hidden clique problem) have been studied over the years in several \ufb01elds\nincluding computer science, physics and statistics. We refer to Section 5 for further discussion\nof the related literature. An intriguing outcome of these works is that, while the two hypothesis are\nstatistically distinguishable as soon as L \u2265 C log n (for C a suf\ufb01ciently large constant) [7], practical\nalgorithms require signi\ufb01cantly larger L. In this paper we study the class of spectral (or eigenvalue-\nbased) tests detecting the hidden submatrix. Our proof technique naturally allow to consider two\nfurther generalizations of this problem that are of independent interests. We brie\ufb02y summarize our\nresults below.\nThe Gaussian hidden clique problem. This is a special case of the above hypothesis testing setting,\nwhereby F0 = N(0, 1) and F1 = N(1, 1) (entries on the diagonal are de\ufb01ned slightly differently in\norder to simplify calculations). Here and below N(m, \u03c32) denote the Gaussian distribution of mean\nm and variance \u03c32. Equivalently, let Z be a random matrix from the Gaussian Orthogonal Ensemble\n(GOE) i.e. Zij \u223c N(0, 1/n) independently for i < j, and Zii \u223c N(0, 2/n). Then, under hypothesis\nH1,L we have X = n\u22121/21U 1T\nU + Z (1U being the indicator vector of U), and under hypothesis\nH0, X = Z (the factor n in the normalization is for technical convenience). The Gaussian hidden\nclique problem can be thought of as the following clustering problem: there are n elements and the\nentry (i, j) measures the similarity between elements i and j. The hidden submatrix corresponds to\na cluster of similar elements, and our goal is to determine given the matrix whether there is a large\ncluster of similar elements or alternatively, whether all similarities are essentially random (Gaussian)\nnoise.\nOur focus in this work is on the following restricted hypothesis testing question. Let \u03bb1 \u2265 \u03bb2 \u2265\n\u00b7\u00b7\u00b7 \u2265 \u03bbn be the ordered eigenvalues of X. Is there a test that depends only on \u03bb1, . . . , \u03bbn and\nthat distinguishes H0 from H1,L \u2018reliably,\u2019 i.e. with error probability converging to 0 as n \u2192 \u221e?\nNotice that the eigenvalues distribution does not depend on U as long as this is independent from\n\u221a\nthe noise Z. We can therefore think of U as \ufb01xed for this question. Historically, the \ufb01rst polynomial\ntime algorithm for detecting a planted clique of size O(\nn) in a random graph [2] relied on spectral\nmethods (see Section 5 for more details). This is one reason for our interest in spectral tests for the\nGaussian hidden clique problem.\n\u221a\nIf L \u2265 (1 + \u03b5)\nn then [11] implies that a simple test checking whether \u03bb1 \u2265 2 + \u03b4 for some\n\u221a\n\u03b4 = \u03b4(\u03b5) > 0 is reliable for the Gaussian hidden clique problem. We prove that this result is tight,\nin the sense that no spectral test is reliable for L \u2264 (1 \u2212 \u03b5)\nRank-one matrices in Gaussian noise. Our proof technique builds on a simple observation. Since\nthe noise Z is invariant under orthogonal transformations1, the above question is equivalent to the\nfollowing testing problem. For \u03b2 \u2208 R\u22650, and v \u2208 Rn, (cid:107)v(cid:107)2 = 1 a uniformly random unit vector,\ntest H0: X = Z versus H1, X = \u03b2vvT + Z. (The correspondence between the two problems yields\n\u03b2 = L/\nAgain, this problem (and a closely related asymmetric version [22]) has been studied in the literature,\nand it follows from [11] that a reliable test exists for \u03b2 \u2265 1 + \u03b5. We provide a simple proof (based\non the second moment method) that no test is reliable for \u03b2 < 1 \u2212 \u03b5.\nRank-one tensors in Gaussian noise. It turns that the same proof applies to an even more general\nproblem: detecting a rank-one signal in a noisy tensor. We carry out our analysis in this more\ngeneral setting for two reasons. First, we think that this clari\ufb01es the what aspects of the model are\nimportant for our proof technique to apply. Second, the problem estimating tensors from noisy data\nhas attracted signi\ufb01cant interest recently within the machine learning community [15, 21].\n\nMore precisely, we consider a noisy tensor X \u2208(cid:78)k Rn, of the form X = \u03b2 v\u2297k + Z, where Z is\n\n\u221a\n\nn.)\n\nn.\n\nGaussian noise, and v is a random unit vector. We consider the problem of testing this hypothesis\nsuch that no test can be reliable for \u03b2 < \u03b22nd\nagainst H0: X = Z. We establish a threshold \u03b22nd\n2 = 1). Two differences are worth remarking for k \u2265 3 with respect to the more\n(in particular \u03b22nd\nfamiliar matrix case k = 2. First, we do not expect the second moment bound \u03b22nd\nto be tight,\n. On the other hand, we can show that it is tight up to\ni.e. a reliable test to exist for all \u03b2 > \u03b22nd\n1By this we mean that, for any orthogonal matrix R \u2208 O(n), independent of Z, RZRT is distributed as Z.\n\nk\n\nk\n\nk\n\nk\n\n2\n\n\fa universal (k and n independent) constant. Second, below \u03b22nd\nthe problem is more dif\ufb01cult than\nthe matrix version below \u03b22nd\n2 = 1: not only no reliable test exists but, asymptotically, any test\nbehaves asymptotically as random guessing. For more details on our results regarding noisy tensors,\nsee Theorem 3.\n\nk\n\n2 Main result for spectral detection\n\nLet Z be a GOE matrix as de\ufb01ned in the previous section. Equivalently if G is an (asymmetric)\nmatrix with i.i.d. entries Gi,j \u223c N(0, 1),\n\n(cid:0)G + GT(cid:1) .\n\nZ =\n\n1\u221a\n2n\n\nFor a deterministic sequence of vectors v(n), (cid:107)v(n)(cid:107)2 = 1, we consider the two hypotheses\n\n\u221a\n\u221a\nA special example is provided by the Gaussian hidden clique problem in which case \u03b2 = L/\nv = 1U /\n\nL for some set U \u2286 [n], |U| = L,\n\nH1,\u03b2 :\n\n(cid:26)H0 :\n(cid:40)\n\nH0 :\nH1,L :\n\nX = Z ,\nX = \u03b2vvT + Z .\n\nX = Z ,\nX = 1\u221a\n\nn 1U 1T\n\nU + Z .\n\n(1)\n\n(2)\n\nn and\n\n(3)\n\nObserve that the distribution of eigenvalues of X, under either alternative, is invariant to the choice\nof the vector v (or subset U), as long as the norm of v is kept \ufb01xed. Therefore, any successful\nalgorithm that examines only the eigenvalues, will distinguish between H0 and H1,\u03b2 but not give\nany information on the vector v (or subset U, in the case of H1,L).\nWe let Q0 = Q0(n) (respectively, Q1 = Q1(n)) denote the distribution of the eigenvalues of X\nunder H0 (respectively H1 = H1,\u03b2 or H1,L).\nA spectral statistical test for distinguishing between H0 and H1 (or simply a spectral test) is a\nmeasurable map Tn : (\u03bb1, . . . , \u03bbn) (cid:55)\u2192 {0, 1}. To formulate precisely what we mean by the word\ndistinguish, we introduce the following notion.\nDe\ufb01nition 1. For each n \u2208 N, let P0,n, P1,n be two probability measures on the same measure\nspace (\u2126n,Fn). We say that the sequence (P1,n) is contiguous with respect to (P0,n) if, for any\nsequence of events An \u2208 Fn,\nlim\nn\u2192\u221e\n\nP0,n(An) = 0 \u21d2 lim\nn\u2192\u221e\n\nP1,n(An) = 0 .\n\n(4)\n\nNote that contiguity is not in general a symmetric relation.\nIn the context of the spectral statistical tests described above, the sequences An in De\ufb01nition 1\n(with Pn = Q0(n) and Qn = Q1(n)) can be put in correspondence with spectral statistical tests\nTn by taking An = {(\u03bb1, . . . , \u03bbn) : Tn(\u03bb1, . . . , \u03bbn) = 0}. We will thus say that H1 is spectrally\ncontiguous with respect to H0 if Qn is contiguous with respect to Pn.\nOur main result on the Gaussian hidden clique problem is the following.\nTheorem 1. For any sequence L = L(n) satisfying lim supn\u2192\u221e L(n)/\nH1,L are spectrally contiguous with respect to H0.\n\nn < 1, the hypotheses\n\n\u221a\n\n2.1 Contiguity and integrability\n\nContiguity is related to a notion of uniform absolute continuity of measures. Recall that a probability\nmeasure \u00b5 on a measure space is absolutely continuous with respect to another probability measure\n\u03bd if for every measurable set A, \u03bd(A) = 0 implies that \u00b5(A) = 0, in which case there exists a\n\u03bd-integrable, non-negative function f \u2261 d\u00b5\nd\u03bd (the Radon-Nikodym derivative of \u00b5 with respect to \u03bd),\nA f d\u03bd for every measurable set A. We then have the following known useful fact:\n\nso that \u00b5(A) =(cid:82)\n\n3\n\n\fLemma 2. Within the setting of De\ufb01nition 1, assume that P1,n is absolutely continuous with respect\nto P0,n, and denote by \u039bn \u2261 dP1,n\ndP0,n\n(a) If lim supn\u2192\u221e E0,n(\u039b2\n(b) If limn\u2192\u221e E0,n(\u039b2\nvariation distance, i.e.\n\nn) = 1, then limn\u2192\u221e (cid:107)P0,n \u2212 P1,n(cid:107)TV = 0, where (cid:107) \u00b7 (cid:107)TV denotes the total\n\nn) < \u221e, then (P1,n) is contiguous with respect to (P0,n).\n\nits Radon-Nikodym derivative.\n\n(cid:107)P0,n \u2212 P1,n(cid:107)TV \u2261 sup\n\n|P0,n(A) \u2212 P1,n(A)(cid:107).\n\nA\n\n2.2 Method and structure of the paper\n\nConsider problem (2). We use the fact that the law of the eigenvalues under both H0 and H1,\u03b2 are\ninvariant under conjugations by a orthogonal matrix. Once we conjugate matrices sampled under the\nhypothesis H1,\u03b2 by an independent orthogonal matrix sampled according to the Haar distribution,\nwe get a matrix distributed as\n\nX = \u03b2vvT + Z ,\n\n(5)\nwhere u is uniform on the n-dimensional sphere, and Z is a GOE matrix (with off-diagonal entries\nof variance 1/n). Letting P1,n denote the law of \u03b2uuT + Z and P0,n denote the law of Z, we show\nthat P1,n is contiguous with respect to P0,n, which implies that the law of eigenvalues Q1(n) is\ncontiguous with respect to Q0(n).\nTo show the contiguity, we consider a more general setup, of independent interest, of Gaussian\ntensors of order k, and in that setup show that the Radon-Nikodym derivative \u039bn,L = dP1,n\nis\ndP0,n\nuniformly square integrable under P0,n; an application of Lemma 2 then quickly yields Theorem 1.\nThe structure of the paper is as follows. In the next section, we de\ufb01ne formally the detection problem\nfor a symmetric tensor of order k \u2265 2. We show the existence of a threshold under which detection\nis not possible (Theorem 3), and show how Theorem 1 follows from this. Section 4 is devoted to\nthe proof of Theorem 3, and concludes with some additional remarks and consequences of Theorem\n3. Finally, Section 5 is devoted to a description of the relation between the Gaussian hidden clique\nproblem and hidden clique problem in computer science, and related literature.\n\n3 A symmetric tensor model and a reduction\n\nExploiting rotational invariance, we will reduce the spectral detection problem to a detection prob-\nlem involving a standard detection problem between random matrices. Since the latter generalizes\nto a tensor setup, we \ufb01rst introduce a general Gaussian hypothesis testing for k-tensors, which is\nof independent interest. We then explain how the spectral detection problem reduces to the special\ncase of k = 2.\n\n3.1 Preliminaries and notation\n\ni=1 uivi, and (cid:107)v(cid:107)p. We write Sn\u22121 for the unit sphere in n dimensions\n\nWe use lower-case boldface for vectors (e.g. u, v) and upper-case boldface for matrices and\ntensors (e.g. X, Z). The ordinary scalar product and (cid:96)p norm over vectors are denoted by\n\n(cid:104)u, v(cid:105) =(cid:80)n\nGiven X \u2208 (cid:78)k Rn a real k-th order tensor, we let {Xi1,...,ik}i1,...,ik denote its coordinates. The\nouter product of two tensors is X \u2297 Y, and, for v \u2208 Rn, we de\ufb01ne v\u2297k = v \u2297 \u00b7\u00b7\u00b7 \u2297 v \u2208(cid:78)k Rn\nas the k-th outer power of v. We de\ufb01ne the inner product of two tensors X, Y \u2208(cid:78)k Rn as\n\nSn\u22121 \u2261(cid:8)x \u2208 Rn : (cid:107)x(cid:107)2 = 1(cid:9) .\n\n(6)\n\n(cid:104)X, Y(cid:105) =\n\nXi1,\u00b7\u00b7\u00b7 ,ik Yi1,\u00b7\u00b7\u00b7 ,ik .\n\n(7)\n\n(cid:88)\n\ni1,\u00b7\u00b7\u00b7 ,ik\u2208[n]\n\n4\n\n\fWe de\ufb01ne the Frobenius (Euclidean) norm of a tensor X by (cid:107)X(cid:107)F = (cid:112)(cid:104)X, X(cid:105), and its operator\n\nnorm by\n\n(cid:107)X(cid:107)op \u2261 max{(cid:104)X, u1 \u2297 \u00b7\u00b7\u00b7 \u2297 uk(cid:105) : \u2200i \u2208 [k] , (cid:107)ui(cid:107)2 \u2264 1}.\n\n(8)\nIt is easy to check that this is indeed a norm. For the special case k = 2, it reduces to the ordinary\n(cid:96)2 matrix operator norm (equivalently, to the largest singular value of X).\nFor a permutation \u03c0 \u2208 Sk, we will denote by X\u03c0 the tensor with permuted indices X\u03c0\n=\nX\u03c0(i1),\u00b7\u00b7\u00b7 ,\u03c0(ik). We call the tensor X symmetric if, for any permutation \u03c0 \u2208 Sk, X\u03c0 = X. It is\nproved [23] that, for symmetric tensors, we have the equivalent representation\n\ni1,\u00b7\u00b7\u00b7 ,ik\n\n(cid:107)X(cid:107)op \u2261 max{|(cid:104)X, u\u2297k(cid:105)| :\n\n(cid:107)u(cid:107)2 \u2264 1}.\n\n(9)\n\nWe de\ufb01ne R \u2261 R \u222a \u221e with the usual conventions of arithmetic operations.\n\n3.2 The symmetric tensor model and main result\n\nWe denote by G \u2208(cid:78)k Rn a tensor with independent and identically distributed entries Gi1,\u00b7\u00b7\u00b7 ,ik \u223c\nWe de\ufb01ne the symmetric standard normal noise tensor Z \u2208(cid:78)k Rn by\n\nN(0, 1) (note that this tensor is not symmetric).\n\n(cid:114) 2\n\n(cid:88)\n\nn\n\n\u03c0\u2208Sk\n\nZ =\n\n1\nk!\n\nG\u03c0 .\n\n(10)\n\nNote that the subset of entries with unequal indices form an i.i.d. collection {Zi1,i2,...,ik}i1<\u00b7\u00b7\u00b7 1,\nit is known that the largest eigenvalue of X, \u03bb1(X) converges almost surely to (\u03b2 + 1/\u03b2) [11]. As\na consequence (cid:107)P0 \u2212 P\u03b2(cid:107)TV \u2192 1 for all \u03b2 > 1: the second moment bound is tight.\nFor k \u2265 3,\nit follows by the triangle inequality that (cid:107)X(cid:107)op \u2265 \u03b2 \u2212 (cid:107)Z(cid:107)op, and further\nlim supn\u2192\u221e (cid:107)Z(cid:107)op \u2264 \u00b5k almost surely as n \u2192 \u221e [19, 5] for some bounded \u00b5k. It follows that\n(cid:107)P0 \u2212 P\u03b2(cid:107)TV \u2192 1 for all \u03b2 > 2\u00b5k [21]. Hence, the second moment bound is off by a k-dependent\nfactor. For large k, 2\u00b5k =\nBehavior below the threshold. Let us stress an important qualitative difference between k = 2 and\nk \u2265 3, for \u03b2 < \u03b22nd\n. For k \u2265 3, the two models are indistinguishable and any test is essentially as\ngood as random guessing. Formally, for any measurable function T : \u2297kRn \u2192 {0, 1}, we have\n\n2 log k + Ok(1) and hence the factor is indeed bounded in k.\n\n\u221a\n\nk\n\n(cid:2)P0(T (X) = 1) + P\u03b2(T (X) = 0)(cid:3) = 1 .\n\n(15)\nFor k = 2, our result implies that, for \u03b2 < 1, (cid:107)P0 \u2212 P\u03b2(cid:107)TV is bounded away from 1. On the other\nhand, it is easy to see that it is bounded away from 0 as well, i.e.\n\nlim\nn\u2192\u221e\n\n(16)\nIndeed, consider for instance the statistics S = Tr(X). Under P0, S \u223c N(0, 2), while under P\u03b2,\nS \u223c N(\u03b2, 2). Hence\n\nn\u2192\u221e(cid:107)P0 \u2212 P\u03b2(cid:107)TV \u2264 lim sup\nn\u2192\u221e\n\n(cid:107)P0 \u2212 P\u03b2(cid:107)TV < 1 .\n\n0 < lim inf\n\n2, 1)(cid:107)TV = 1 \u2212 2\u03a6\n\n> 0\n\n(17)\n\n(cid:16) \u2212 \u03b2\n\n\u221a\n\n(cid:17)\n\n2\n\n2\n\n\u221a\nn\u2192\u221e(cid:107)P0 \u2212 P\u03b2(cid:107)TV \u2265 (cid:107)N(0, 1) \u2212 N(\u03b2/\n\nlim inf\n\n(Here \u03a6(x) =(cid:82) x\n\n\u221a\n\u2212\u221e e\u2212z2/2dz/\n\nfor rectangular matrices (k = 2) is discussed in detail in [22].\n\n2\u03c0 is the Gaussian distribution function.) The same phenomenon\n\n3.3 Reduction of spectral detection to the symmetric tensor model, k = 2\n\n(cid:98)X \u2261 RXRT =\n\n1\u221a\nn\n\nRecall that in the setup of Theorem 1, Q0,n is the law of the eigenvalues of X under H0 and Q1,n\nis the law of the eigenvalues of X under H1,L. Then Q1,n is invariant by conjugation of orthogonal\nmatrices. Therefore, the detection problem is not changed if we replace X = n\u22121/21U 1T\nU + Z by\n\n(cid:98)X = \u03b2vvT +(cid:101)Z,\n\n(18)\nwhere R \u2208 O(n) is an orthogonal matrix sampled according to the Haar measure. A direct calcula-\ntion yields\n\nR1U (R1U )T + RZRT ,\n\n\u221a\n\nn, and (cid:101)Z is a GOE matrix (with off-\n\n(19)\n\nwhere v is uniform on the n dimensional sphere, \u03b2 = L/\n\ndiagonal entries of variance 1/n). Furthermore, v and(cid:101)Z are independent of one another.\nLet P1,n be the law of (cid:98)X. Note that P1,n = P(k=2)\n\n\u221a\nwith \u03b2 = L/\n\n\u03b2\n\nproblem of H0 vs. H1,L to the detection problem of P0,n vs. P1,n as follows.\nLemma 4. (a) If P1,n is contiguous with respect to P0,n then H1,L is spectrally contiguous with\nrespect to H0.\n(b) We have\n\nn. We can relate the detection\n\n(cid:107)Q0,n \u2212 Q1,n(cid:107)TV \u2264 (cid:107)P0,n \u2212 P1,n(cid:107)TV.\n\nIn view of Lemma 4, Theorem 1 is an immediate consequence of Theorem 3.\n\n4 Proof of Theorem 3\n\nThe proof uses the following large deviations lemma, which follows, for instance, from [9, Proposi-\ntion 2.3].\n\n6\n\n\fLemma 5. Let v a uniformly random vector on the unit sphere Sn\u22121 and let (cid:104)v, e1(cid:105) be its \ufb01rst\ncoordinate. Then, for any interval [a, b] with \u22121 \u2264 a < b \u2264 1\n\nlim\nn\u2192\u221e\n\n1\nn\n\nlog P((cid:104)v, e1(cid:105) \u2208 [a, b]) = max\n\nlog(1 \u2212 q2) : q \u2208 [a, b]\n\n.\n\n(20)\n\nProof of Theorem 3. We denote by \u039b the Radon-Nikodym derivative of P\u03b2 with respect to P0. By\nde\ufb01nition E0\u039b = 1. It is easy to derive the following formula\n\n(cid:110) 1\n\n2\n\n(cid:111)\n\n(21)\n\n(22)\n\n(23)\n\n(24)\n\n(25)\n\n\u00b5n(dv) .\nwhere \u00b5n is the uniform measure on Sn\u22121. Squaring and using (11), we get\n\n\u039b =\n\nexp\n\n+\n\n4\n\n(cid:90)\n(cid:90)\n(cid:90)\n(cid:110) n\u03b22\n(cid:110) n\u03b22\n\n2\n\n2\n\n(cid:110) \u2212 n\u03b22\n(cid:110) n\u03b2\n(cid:110) n\u03b22\n(cid:13)(cid:13)v1\n(cid:104)v1, v2(cid:105)k(cid:111)\n(cid:104)v, e1(cid:105)k(cid:111)\n\n2\n\n4\n\nn\u03b2\n2\n\n(cid:104)X, v\u2297k(cid:105)(cid:111)\n\u2297k(cid:105)(cid:111)\n(cid:111)\n\u2297k(cid:13)(cid:13)2\n\n\u2297k + v2\n\nF\n\n\u00b5n(dv1)\u00b5n(dv2)\n\n\u00b5n(dv) ,\n\n(cid:90)\n(cid:90)\n\n=\n\n=\n\nexp\n\nexp\n\nE0\u039b2 = e\u2212n\u03b22/2\n\nE0 exp\n\n(cid:104)X, v1\n\n\u00b5n(dv1)\u00b5n(dv2)\n\n= e\u2212n\u03b22/2\n\nexp\n\n\u2297k + v2\n\n\u00b5n(dv1)\u00b5n(dv2)\n\nwhere in the \ufb01rst step we used (11) and in the last step, we used rotational invariance.\nLet F\u03b2 : [\u22121, 1] \u2192 R be de\ufb01ned by\n\nlog(1 \u2212 q2) .\nUsing Lemma 5 and Varadhan\u2019s lemma, for any \u22121 \u2264 a < b \u2264 1,\n\nF\u03b2(q) \u2261 \u03b22qk\n2\n\n1\n2\n\n+\n\n(cid:90)\n\n(cid:110) n\u03b22\n\n2\n\nexp\n\n(cid:104)v, e1(cid:105)k(cid:111) I((cid:104)v, e1(cid:105) \u2208 [a, b]) \u00b5n(dv) = exp\n(cid:90)\n\n(cid:110) n\u03b22\n\nk\n\n(cid:110)\n\nthat max|q|\u2265\u03b5 F\u03b2(q) < 0 for any \u03b5 > 0. Hence\n\n(cid:104)v, e1(cid:105)k(cid:111) I(|(cid:104)v, e1(cid:105)| \u2264 \u03b5) \u00b5n(dv) + e\u2212c(\u03b5)n ,\n\n(cid:111)\n\n.\n\nF\u03b2(q) + o(n)\n\nn max\nq\u2208[a,b]\n\nE0\u039b2 \u2264\n\nexp\n\n2\n\nIt follows from the de\ufb01nition of \u03b22nd\n\nn\n\n2\n\n2\n\nexp\n\nexp\n\n+ e\u2212c(\u03b5)n\n\n|G|k\nZ k/2\n|G|k\nZ k/2\n\nfor some c(\u03b5) > 0 and all n large enough. Next notice that, under \u00b5n, (cid:104)v, e1(cid:105) d= G/(G2 + Zn\u22121)1/2\nwhere G \u223c N(0, 1) and Zn\u22121 is a \u03c72 with n\u2212 1 degrees of freedom independent of G. Then, letting\n(cid:17) I(|G/Z 1/2\n(cid:111)\nE0\u039b2 \u2264 E(cid:110)\nZn \u2261 G2 + Zn\u22121 (a \u03c72 with n degrees of freedom)\nn | \u2264 \u03b5)\n\u2264 E(cid:110)\n(cid:17) I(|G/Z 1/2\n+ en\u03b22\u03b5k/2P(cid:8)Zn\u22121 \u2264 n(1 \u2212 \u03b4)(cid:9) + e\u2212c(\u03b5)n\n|G|k(cid:17)I(|G|2 \u2264 2\u03b5n)\n\u2264 E(cid:110)\n\n(cid:16) n\u03b22\n(cid:16) n\u03b22\n(cid:16) n1\u2212(k/2)\u03b22\n(cid:90) 2\u03b5n\neC(\u03b2,\u03b4)n1\u2212k/2xk\u2212x2/2dx + en\u03b22\u03b5k/2P(cid:8)Zn\u22121 \u2264 n(1 \u2212 \u03b4)(cid:9) + e\u2212c(\u03b5)n ,\n\n+ en\u03b22\u03b5k/2P(cid:8)Zn\u22121 \u2264 n(1 \u2212 \u03b4)(cid:9) + e\u2212c(\u03b5)n\n\n(26)\nwhere C(\u03b2, \u03b4) = \u03b22/(2(1 \u2212 \u03b4)k/2). Now, for any \u03b4 > 0, we can (and will) choose \u03b5 small enough\nvariables) and, if k \u2265 3, the argument of the exponent in the integral in the right hand side of (26)\n\nso that both en\u03b22\u03b5k/2P(cid:8)Zn\u22121 \u2264 n(1 \u2212 \u03b4)(cid:9) \u2192 0 exponentially fast (by tail bounds on \u03c72 random\n\nn | \u2264 \u03b5) I(Zn\u22121 \u2265 n(1 \u2212 \u03b4))\n\n2(1 \u2212 \u03b4)k/2\n\n2\u221a\n2\u03c0\n\n(cid:111)\n\n(cid:111)\n\nexp\n\n=\n\nn\n\n0\n\n7\n\n\fis bounded above by \u2212x2/4, which is possible since the argument vanishes at x\u2217 = 2C(\u03b2, \u03b4)n1/2.\nHence, for any \u03b4 > 0, and all n large enough, we have\n\neC(\u03b2,\u03b4)n1\u2212k/2xk\u2212x2/2dx + e\u2212c(\u03b4)n ,\n\n(27)\n\n(cid:90) 2\u03b5n\n\nE0\u039b2 \u2264 2\u221a\n2\u03c0\n\n0\n\nfor some c(\u03b4) > 0.\nNow, for k \u2265 3 the integrand in (27) is dominated by e\u2212x2/4 and converges pointwise (as n \u2192 \u221e)\nto 1. Therefore, since E0\u039b2 \u2265 (E0\u039b)2 = 1,\nk \u2265 3 :\n\nE0\u039b2 = 1 .\n\n(28)\n\nlim\nn\u2192\u221e\n\nFor k = 2, the argument is independent of n and can be integrated immediately, yielding (after\ntaking the limit \u03b4 \u2192 0)\n\nk = 2 :\n\nlim sup\nn\u2192\u221e\n\nE0\u039b2 \u2264\n\n1(cid:112)1 \u2212 \u03b22\n\n.\n\n(29)\n\n(Indeed, the above calculation implies that the limit exists and is given by the right-hand side.)\nThe proof is completed by invoking Lemma 2.\n\n\u221a\n\n5 Related work\nIn the classical G(n, 1/2) planted clique problem, the computational problem is to \ufb01nd the planted\nclique (of cardinality k) in polynomial time, where we assume the location of the planted clique is\nhidden and is not part of the input. There are several algorithms that recover the planted clique in\n\u221a\nn where C > 0 is a constant independent of n [2, 8, 10]. Despite\npolynomial time when k = C\nsigni\ufb01cant effort, no polynomial time algorithm for this problem is known when k = o(\nn). In the\ndecision version of the planted clique problem, one seeks an ef\ufb01cient algorithm that distinguishes\nbetween a random graph distributed as G(n, 1/2) or a random graph containing a planted clique of\nsize k \u2265 (2 + \u03b4) log n (for \u03b4 > 0; the natural threshold for the problem is the size of the largest\n\u221a\nclique in a random sample of G(n, 1/2), which is asymptotic to 2 log n [14]). No polynomial time\nalgorithm is known for this decision problem if k = o(\nAs another example, consider the following setting introduced by [4] (see also [1]): one is given\na realization of a n-dimensional Gaussian vector x := (x1, .., xn) with i.i.d. entries. The goal is\nto distinguish between the following two hypotheses. Under the \ufb01rst hypothesis, all entries in x\nare i.i.d. standard normals. Under the second hypothesis, one is given a family of subsets C :=\n{S1, ..., Sm} such that for every 1 \u2264 k \u2264 m, Sk \u2286 {1, ..., n} and there exists an i \u2208 {1, . . . , m}\nsuch that, for any \u03b1 \u2208 Si, x\u03b1 is a Gaussian random variable with mean \u00b5 > 0 and unit variance\nwhereas for every \u03b1 /\u2208 Si, x\u03b1 is standard normal. (The second hypothesis does not specify the\nindex i, only its existence). The main question is how large \u00b5 must be such that one can reliably\ndistinguish between these two hypotheses. In [4], \u03b1 are vertices in certain undirected graphs and the\nfamily C is a set of pre-speci\ufb01ed paths in these graphs.\nThe Gaussian hidden clique problem is related to various applications in statistics and computational\nbiology [6, 18]. That detection is statistically possible when L (cid:29) log n was established in [1]. In\n\u221a\nterms of polynomial time detection, [8] show that detection is possible when L = \u0398(\nn) for the\n\u221a\nsymmetric cases. As noted, no polynomial time algorithm is known for the Gaussian hidden clique\nproblem when k = o(\nn). In [1, 20] it was hypothesized that the Gaussian hidden clique problem\n\nn).\n\nshould be dif\ufb01cult when L (cid:28) \u221a\n\nn.\n\nThe closest results to ours are the ones of [22]. In the language of the present paper, these authors\n2 + Z \u2208 Rn1\u00d7n2 whereby Z has i.i.d. entries\nconsider a rectangular matrix of the form X = \u03bb v1vT\nZij \u223c N(0, 1/n1), v1 is deterministic of unit norm, and v2 has entries which are i.i.d. N(0, 1/n1),\nindependent of Z. They consider the problem of testing this distribution against \u03bb = 0. Setting\n, it is proved in [22] that the distribution of the singular values of X under the\nc = limn\u2192\u221e n1\n\u221a\nn2\nnull and the alternative are mutually contiguous if \u03bb <\nc.\nWhile [22] derive some more re\ufb01ned results, their proofs rely on advanced tools from random matrix\ntheory [13], while our proof is simpler, and generalizable to other settings (e.g. tensors).\n\nc and not mutually contiguous if \u03bb >\n\n\u221a\n\n8\n\n\fReferences\n\n[1] L. Addario-Berry, N. Broutin, L. Devroye, G. Lugosi. On combinatorial testing problems. Annals of\n\nStatistics 38(5) (2011), 3063\u20133092.\n\n[2] N. Alon, M. Krivelevich and B. Sudakov. Finding a large hidden clique in a random graph. Random\n\nStructures and Algorithms 13 (1998), 457\u2013466.\n\n[3] G. W. Anderson, A. Guionnet and O. Zeitouni. An introduction to random matrices. Cambridge Univer-\n\nsity Press (2010).\n\n[4] E. Arias-Castro, E. J., Cand`es, H. Helgason and O. Zeitouni. Searching for a trail of evidence in a maze.\n\nAnnals of Statistics 36 (2008), 1726\u20131757.\n\n[5] A. Auf\ufb01nger, G. Ben Arous, and J. Cerny. Random matrices and complexity of spin glasses. Communi-\n\ncations on Pure and Applied Mathematics 66(2) (2013), 165\u2013201.\n\n[6] S. Balakrishnan, M. Kolar, A. Rinaldo, A. Singh, and L. Wasserman. Statistical and computational\n\ntradeoffs in biclustering. NIPS Workshop on Computational Trade-offs in Statistical Learning (2011).\n\n[7] S. Bhamidi, P.S. Dey, and A.B. Nobel. Energy landscape for large average submatrix detection problems\n\n[8] Y. Deshpande and A. Montanari. Finding hidden cliques of size(cid:112)N/e in nearly linear time. Foundations\n\nin Gaussian random matrices. arXiv:1211.2284.\n\nof Computational Mathematics (2014), 1\u201360\n\n[9] A. Dembo and O. Zeitouni. Matrix optimization under random external \ufb01elds. arXiv:1409.4606\n[10] U. Feige and R. Krauthgamer. Finding and certifying a large hidden clique in a semi-random graph.\n\nRandom Struct. Algorithms 162(2) (1999), 195\u2013208.\n\n[11] D. F\u00b4eral and S. P\u00b4ech\u00b4e. The largest eigenvalue of rank one deformation of large Wigner matrices. Comm.\n\nMath. Phys. 272 (2007), 185\u2013228.\n\n[12] Z. F\u00a8uredi and J. Koml\u00b4os, The eigenvalues of random symmetric matrices. Combinatorica 1 (1981),\n\n233\u2013241.\n\n[13] A. Guionnet and M. Maida. A Fourier view on R-transform and related asymptotics of spherical integrals.\n\nJournal of Functional Analysis 222 (2005), 435\u2013490.\n\n[14] G. R. Grimmett and C. J. H. McDiarmid. On colouring random graphs. Math. proc. Cambridge Philos.\n\nSoc. 77 (1975), 313\u2013324.\n\n[15] D. Hsu, S. M. Kakade, and T. Zhang. A spectral algorithm for learning hidden Markov models. Journal\n\nof Computer and System Sciences 78.5 (2012): 1460-1480.\n\n[16] M. Jerrum. Large cliques elude the Metropolis process. Random Struct. Algorithms 3(4) (1992), 347\u2013360.\n[17] A. Knowles and J. Yin, The isotropic semicircle law and deformation of Wigner matrices. Communica-\n\ntions on Pure and Applied Mathematics 66(11) (2013), 1663\u20131749.\n\n[18] M. Kolar, S. Balakrishnan, A. Rinaldo, and A. Singh. Minimax localization of structural information in\n\nlarge noisy matrices. Neural Information Processing Systems (NIPS), (2011), 909\u2013917.\n\n[19] M. Talagrand. Free energy of the spherical mean \ufb01eld model. Probability theory and related \ufb01elds 134(3)\n\n(2006), 339\u2013382.\n\n[20] Z Ma and Y Wu. Computational barriers in minimax submatrix detection. arXiv:1309.5914.\n[21] A. Montanari and E. Richard. A Statistical Model for Tensor PCA. Neural Information Processing\n\nSystems (NIPS) (2014), 2897\u20132905.\n\n[22] A. Onatski, M. J. Moreira, M. Hallin, et al. Asymptotic power of sphericity tests for high-dimensional\n\ndata. The Annals of Statistics 41(3) (2013), 1204\u20131231.\n\n[23] W. C. Waterhouse. The absolute-value estimate for symmetric multilinear forms. Linear Algebra and its\n\nApplications 128 (1990), 97\u2013105.\n\n9\n\n\f", "award": [], "sourceid": 104, "authors": [{"given_name": "Andrea", "family_name": "Montanari", "institution": "Stanford"}, {"given_name": "Daniel", "family_name": "Reichman", "institution": "Cornell University"}, {"given_name": "Ofer", "family_name": "Zeitouni", "institution": "Weizmann Institute and Courant Institute"}]}