{"title": "Average-case hardness of RIP certification", "book": "Advances in Neural Information Processing Systems", "page_first": 3819, "page_last": 3827, "abstract": "The restricted isometry property (RIP) for design matrices gives guarantees for optimal recovery in sparse linear models. It is of high interest in compressed sensing and statistical learning. This property is particularly important for computationally efficient recovery methods. As a consequence, even though it is in general NP-hard to check that RIP holds, there have been substantial efforts to find tractable proxies for it. These would allow the construction of RIP matrices and the polynomial-time verification of RIP given an arbitrary matrix. We consider the framework of average-case certifiers, that never wrongly declare that a matrix is RIP, while being often correct for random instances. While there are such functions which are tractable in a suboptimal parameter regime, we show that this is a computationally hard task in any better regime. Our results are based on a new, weaker assumption on the problem of detecting dense subgraphs.", "full_text": "Average-case hardness of RIP certi\ufb01cation\n\nTengyao Wang\n\nCentre for Mathematical Sciences\n\nCambridge, CB3 0WB, United Kingdom\n\nt.wang@statslab.cam.ac.uk\n\nQuentin Berthet\n\nCentre for Mathematical Sciences\n\nCambridge, CB3 0WB, United Kingdom\n\nq.berthet@statslab.cam.ac.uk\n\n1986 Mathematics Road\n\nVancouver BC V6T 1Z2, Canada\n\nYaniv Plan\n\nyaniv@math.ubc.ca\n\nAbstract\n\nThe restricted isometry property (RIP) for design matrices gives guarantees for\noptimal recovery in sparse linear models.\nIt is of high interest in compressed\nsensing and statistical learning. This property is particularly important for com-\nputationally ef\ufb01cient recovery methods. As a consequence, even though it is in\ngeneral NP-hard to check that RIP holds, there have been substantial efforts to\n\ufb01nd tractable proxies for it. These would allow the construction of RIP matrices\nand the polynomial-time veri\ufb01cation of RIP given an arbitrary matrix. We con-\nsider the framework of average-case certi\ufb01ers, that never wrongly declare that a\nmatrix is RIP, while being often correct for random instances. While there are\nsuch functions which are tractable in a suboptimal parameter regime, we show\nthat this is a computationally hard task in any better regime. Our results are based\non a new, weaker assumption on the problem of detecting dense subgraphs.\n\nIntroduction\n\nIn many areas of data science, high-dimensional signals contain rich structure. It is of great in-\nterest to leverage this structure to improve our ability to describe characteristics of the signal and\nto make future predictions. Sparsity is a structure of wide applicability (see, e.g. Mallat, 1999;\nRauhut and Foucart, 2013; Eldar and Kutyniok, 2012), with a broad literature dedicated to its study\nin various scienti\ufb01c \ufb01elds.\nThe sparse linear model takes the form y = X\u03b2 + \u03b5, where y \u2208 Rn is a vector of observations,\nX \u2208 Rn\u00d7p is a design matrix, \u03b5 \u2208 Rn is noise, and the vector \u03b2 \u2208 Rp is assumed to have a\nsmall number k of non-zero entries. Estimating \u03b2 or the mean response, X\u03b2, are among the most\nwidely studied problems in signal processing, as well as in statistical learning. In high-dimensional\nproblems, one would wish to recover \u03b2 with as few observations as possible. For an incoherent\ndesign matrix, it is known that an order of k2 observations suf\ufb01ce (Donoho, Elad and Temlyakov,\n2006; Donoho and Elad, 2003). However, this appears to require a number of observations far\nexceeding the information content of \u03b2, which has only k variables, albeit with unknown locations.\nThis dependence in k can be greatly improved by using design matrices that are almost isometries\non some low dimensional subspaces, i.e., matrices that satisfy the restricted isometry property with\nparameters k and \u03b8, or RIP(k, \u03b8) (see De\ufb01nition 1.1). It is a highly robust property, and in fact\nimplies that many different polynomial time methods, such as greedy methods (Blumensath and\nDavies, 2009; Needell and Tropp, 2009; Dai and Milenkovic, 2009) and convex optimization (Can-\nd\u00e8s, 2008; Cand\u00e8s, Romberg and Tao, 2006b; Cand\u00e8s and Tao, 2005), are stable in recovering \u03b2.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fRandom matrices are known to satisfy the RIP when the number n of observation is more than about\nk log(p)/\u03b82. These results were developed in the \ufb01eld of compressed sensing (Cand\u00e8s, Romberg and\nTao, 2006a; Donoho, 2006; Rauhut and Foucart, 2013; Eldar and Kutyniok, 2012) where the use of\nrandomness still remains pivotal for near-optimal results. Properties related to the conditioning of\ndesign matrices have also been shown to play a key role in the statistical properties of computa-\ntionally ef\ufb01cient estimators of \u03b2 (Zhang, Wainwright and Jordan, 2014). While the assumption of\nrandomness allows great theoretical leaps, it leaves open questions for practitioners.\nScientists working on data closely following this model cannot always choose their design matrix\nX, or at least choose one that is completely random. Moreover, it is in general practically impos-\nsible to check that a given matrix satis\ufb01es these desired properties, as RIP certi\ufb01cation is NP-hard\n(Bandeira et al., 2012). Having access to a function, or statistic, of X that could be easily computed,\nwhich determines how well \u03b2 may be estimated, would therefore be of a great help. The search\nfor such statistics has been of great importance for over a decade now, and several have been pro-\nposed (d\u2019Aspremont and El Ghaoui, 2011; Lee and Bresler, 2008; Juditsky and Nemirovski, 2011;\nd\u2019Aspremont, Bach and El Ghaoui, 2008). Perhaps the simplest and most popular is the incoherence\nparameter, which measures the maximum inner product between distinct, normalized, columns of\nX. However, all of these are known to necessarily fail to guarantee good recovery when p \u2265 2n\nunless n is of order k2 (d\u2019Aspremont and El Ghaoui, 2011). Given a speci\ufb01c problem instance, the\nstrong recovery guarantees of compressed sensing cannot be veri\ufb01ed based on these statistics.\nIn this article, we study the problem of average-case certi\ufb01cation of the Restricted Isometry Property\n(RIP). A certi\ufb01er takes as input a design matrix X, always outputs \u2018false\u2019 when X does not satisfy\nthe property, and outputs \u2018true\u2019 for a large proportion of matrices (see De\ufb01nition 2.1). Indeed, worst-\ncase hardness does not preclude a problem from being solvable for most instances. The link between\nrestricted isometry and incoherence implies that polynomial time certi\ufb01ers exists in a regime where\nn is of order k2 log(p)/\u03b82. It is natural to ask whether the RIP can be certi\ufb01ed for sample size\nn (cid:29) k log(p)/\u03b82, where most matrices (with respect to, say, the Gaussian measure) are RIP. If it\ndoes, it would also provide a Las Vegas algorithm to construct RIP design matrices of optimal sizes.\nThis should be compared with the currently existing limitations for the deterministic construction of\nRIP matrices.\nOur main result is that certi\ufb01cation in this sense is hard even in a near-optimal regime, assuming a\nnew, weaker assumption on detecting dense subgraphs, related to the Planted Clique hypothesis.\nTheorem (Informal). For any \u03b1 < 1, there is no computationally ef\ufb01cient, average-case certi\ufb01er\nfor the class RIPn,p(k, \u03b8) uniformly over an asymptotic regime where n (cid:28) k1+\u03b1/\u03b82.\nThis suggests that even in the average case, RIP certi\ufb01cation requires almost k2 log(p)/\u03b82 observa-\ntions. This contrasts highly with the fact that a random matrix satis\ufb01es RIP with high probability\nwhen n exceeds about k log(p)/\u03b82. Thus, there appears to be a large gap between what a practitioner\nmay be able to certify given a speci\ufb01c problem instance, and what holds for a random matrix.On the\nother hand, if a certi\ufb01er is found which \ufb01lls this gap, the result would not only have huge practical\nimplications in compressed sensing and statistical learning, but would also disprove a long-standing\nconjecture from computational complexity theory.\nWe focus solely on the restricted isometry property, but other conditions under which compressed\nsensing is possible are also known. Extending our results to the restricted eigenvalue condition\nBickel, Ritov and Tsybakov (2009) or other conditions (see, van de Geer and Buhlmann, 2009, and\nreferences therein) is an interesting path for future research.\nOur result shares many characteristics with a hypothesis by Feige (2002) on the hardness of refuting\nrandom satis\ufb01ability formulas. Indeed, our statement is also about the hardness of verifying that\na property holds for a particular instance (RIP for design matrices, instead of unsatis\ufb01ability for\nboolean formulas). It concerns a regime where such a property should hold with high probability (n\nof order k1+\u03b1/\u03b82, linear regime for satis\ufb01ability), cautiously allowing only one type of errors, false\nnegatives, for a problem that is hard in the worst case. In these two examples, such certi\ufb01ers exist in\na sub-optimal regime. Our problem is conceptually different from results regarding the worst-case\nhardness of certifying this property (see, e.g. Bandeira et al., 2012; Koiran and Zouzias, 2012; Till-\nmann and Pfetsch, 2014). It is closer to another line of work concerned with computational lower\nbounds for statistical learning problems based on average-case assumptions. The planted clique\nassumption has been used to prove computational hardness results for statistical problems such as\nestimation and testing of sparse principal components (Berthet and Rigollet, 2013a,b; Wang, Berthet\n\n2\n\n\fand Samworth, 2016), testing and localization of submatrix signals (Ma and Wu, 2013; Chen and\nXu, 2014), community detection (Hajek, Wu and Xu, 2015) and sparse canonical correlation anal-\nysis (Gao, Ma and Zhou, 2014). The intractability of noisy parity recovery problem (Blum, Kalai\nand Wasserman, 2003) has also been used recently as an average-case assumption to deduce com-\nputational hardness of detection of satis\ufb01ability formulas with lightly planted solutions (Berthet and\nEllenberg, 2015). Additionally, several unconditional computational hardness results are shown for\nstatistical problems under constraints of learning models (Feldman et al., 2013). The present work\nhas two main differences compared to previous computational lower bound results. First, in a de-\ntection setting, these lower bounds concern two speci\ufb01c distributions (for the null and alternative\nhypothesis), while ours is valid for all sub-Gaussian distributions, and there is no alternative distri-\nbution. Secondly, our result is not based on the usual assumption for the Planted Clique problem.\nInstead, we use a weaker assumption on a problem of detecting planted dense graphs. This does\nnot mean that the planted graph is a random graph with edge probability q > 1/2 as considered\nin (Arias-Castro and Verzelen, 2013; Bhaskara et al., 2010; Awasthi et al., 2015), but that it can be\nany graph with an unexpectedly high number of edges (see section 3.1). This choice is made to\nstrengthen our result: it would \u2018survive\u2019 the discovery of an algorithm that would use very speci\ufb01c\nproperties of cliques (or even of random dense graphs) to detect their presence. As a consequence,\nthe analysis of our reduction is more technically complicated.\nOur work is organized in the following manner: We recall in Section 1 the de\ufb01nition of the restricted\nisometry property, and some of its known properties. In Section 2, we de\ufb01ne the notion of certi\ufb01er,\nand prove the existence of a computationally ef\ufb01cient certi\ufb01er in a sub-optimal regime. Our main\nresult is developed in Section 3, focused on the hardness of average-case certi\ufb01cation. The proofs\nof the main results are in Appendix A of the supplementary material and those of auxiliary results\nin Appendix B of the the supplementary material.\n\n1 Restricted Isometric Property\n\n1.1 Formulation\n\nWe use the de\ufb01nition of Cand\u00e8s and Tao (2005), who introduced this notion. Below, for a vector\nu \u2208 Rp, we de\ufb01ne (cid:107)u(cid:107)0 is the number of its non-zero entries.\nDe\ufb01nition (RIP). A matrix X \u2208 Rn\u00d7p satis\ufb01es the restricted isometry property with sparsity k \u2208\n{1, . . . , p} and distortion \u03b8 \u2208 (0, 1), denoted by X \u2208 RIPn,p(k, \u03b8), if it holds that\n\n1 \u2212 \u03b8 \u2264 (cid:107)Xu(cid:107)2\n\n2 \u2264 1 + \u03b8,\n\nfor every u \u2208 Sp\u22121(k) := {u \u2208 Rp : (cid:107)u(cid:107)2 = 1,(cid:107)u(cid:107)0 \u2264 k}.\nThis can be equivalently de\ufb01ned by a property on submatrices of the design matrix: X is in\nRIPn,p(k, \u03b8) if and only if for any set S of k columns of X, the submatrix, X\u2217S, formed by taking\nany these columns is almost an isometry, i.e. if the spectrum of its Gram matrix is contained in the\ninterval [1 \u2212 \u03b8, 1 + \u03b8]:\n\nDenote by (cid:107) \u00b7 (cid:107)op,k the k-sparse operator norm, de\ufb01ned for a matrix A as (cid:107)A(cid:107)op,k =\nsupx\u2208Sp\u22121(k) (cid:107)Ax(cid:107)2. This yields another equivalent formulation of the RIP property: X \u2208\nRIPn,p(k, \u03b8) if and only if\n\n(cid:107)X(cid:62)\n\n\u2217SX\u2217S \u2212 Ik(cid:107)op \u2264 \u03b8 .\n\n(cid:107)X(cid:62)X \u2212 Ip(cid:107)op,k \u2264 \u03b8 .\n\nWe assume in the following discussion that the distortion parameter \u03b8 is upper-bounded by 1. For\nv \u2208 Rp and T \u2286 {1, . . . , p}, we write vT for the #T -dimensional vector obtained by restricting\nv to coordinates indexed by T . Similarly, for an n \u00d7 p matrix A and subsets S \u2286 {1, . . . , n} and\nT \u2286 {1, . . . , p}, we write AS\u2217 for the submatrix obtained by restricting A to rows indexed by S,\nA\u2217T for the submatrix obtained by restricting A to columns indexed by T .\n\n1.2 Generation via random design\n\nMatrices that satisfy the restricted isometry property have many interesting applications in high-\ndimensional statistics and compressed sensing. However, there is no known way to generate them\n\n3\n\n\fassociated random variable) is said to be sub-Gaussian with parameter \u03c3 if(cid:82)\n\nIt is even NP-hard to check whether a given matrix X belongs to\ndeterministically in general.\n\u221a\nRIPn,p(k, \u03b8) (see, e.g Bandeira et al., 2012). Several deterministic constructions of RIP matrices\nexist for sparsity level k (cid:46) \u03b8\ncircle theorem, one can construct RIP matrices with sparsity k \u2264 \u221a\nn. For example, using equitriangular tight frames and Gershgorin\u2019s\n\u221a\nn and distortion \u03b8 bounded\naway from 0 (see, e.g. Bandeira et al., 2012). The limitation k \u2264 \u03b8\nn is known as the \u2018square\nroot bottleneck\u2019. To date, the only constructions that break the \u2018square root bottleneck\u2019 are due to\nBourgain et al. (2011) and Bandeira, Mixon and Moreira (2014), both of which give RIP guarantee\nfor k of order n1/2+\u0001 for some small \u0001 > 0 and \ufb01xed \u03b8 (the latter construction is conditional on a\nnumber-theoretic conjecture being true).\nInterestingly though, it is easy to generate large matrices satisfying the restricted isometry property\nthrough random design, and compared to the \ufb01xed design matrices mentioned in the previous para-\ngraph, these random design constructions are much less restrictive on the sparsity level, typically\nallowing k up to the order n/ log(p) (assuming \u03b8 is bounded away from zero). They can be con-\nstructed easily from any centred sub-Gaussian distribution. We recall that a distribution Q (and its\nR e\u03bbx dQ(x) \u2264 e\u03bb2\u03c32/2\nfor all \u03bb \u2208 R.\nDe\ufb01nition. De\ufb01ne Q = Q\u03c3 to be the set of sub-Gaussian distributions Q over R with zero mean,\nunit variance, and sub-Gaussian parameter at most \u03c3.\nThe most common choice for a Q \u2208 Q is the standard normal distribution N (0, 1). Note that by\nR x2 dQ(x) = 1. In the rest of the\n\u221a\npaper, we treat \u03c3 as \ufb01xed. De\ufb01ne the normalized distribution \u02dcQ to be the distribution of Z/\nn for\nZ \u223c Q. The following well-known result states that by concentration of measure, random matrices\ngenerated with distribution \u02dcQ\u2297(n\u00d7p) satisfy restricted isometries (see, e.g. Cand\u00e8s and Tao (2005)\nand Baraniuk et al. (2008)). For completeness, we include a proof that establishes these particular\nconstants stated here. All proofs are deferred to Appendix A or Appendix B of the supplementary\nmaterial.\nProposition 1. Suppose X is a random matrix with distribution \u02dcQ\u2297(n\u00d7p), where Q \u2208 Q. It holds\nthat\n\nTaylor expansion, for any Q \u2208 Q, we necessarily have \u03c32 \u2265(cid:82)\n\n(cid:26)\nP(cid:0)X \u2208 RIPn,p(k, \u03b8)(cid:1) \u2265 1 \u2212 2 exp\n\n(cid:18) 9ep\n\n(cid:19)\n\n.\n\n(1)\n\nk log\n\nk\n\n\u2212 n\u03b82\n256\u03c34\n\nIn order to clarify the notion of asymptotic regimes used in this paper, we introduce the following\nde\ufb01nition.\nDe\ufb01nition. For 0 \u2264 \u03b1 \u2264 1, de\ufb01ne the asymptotic regime\n\n(cid:26)\n\n(cid:27)\n\n(cid:27)\n\n.\n\nR\u03b1 :=\n\n(pn, kn, \u03b8n)n : p, k \u2192 \u221e and n (cid:29) k1+\u03b1\n\nn\n\nlog(pn)\n\u03b82\nn\n\nWe note that in this notation, Proposition 1 implies that for (p, k, \u03b8) = (pn, kn, \u03b8n) \u2208 R0 we have,\nlimn\u2192\u221e \u02dcQ\u2297(n\u00d7p)(X \u2208 RIPn,p(k, \u03b8)) = 1, and this convergence is uniform over Q \u2208 Q.\n\n2 Certi\ufb01cation of Restricted Isometry\n\n2.1 Objectives and de\ufb01nition\n\nIn practice, it is useful to know with certainty whether a particular realization of a random design\nmatrix satis\ufb01es the RIP condition. It is known that the problem of deciding if a given matrix is RIP\nis NP-hard (Bandeira et al., 2012). However, NP-hardness is a only a statement about worst-case\ninstances. It would still be of great use to have an algorithm that can correctly decide RIP property\nfor an average instance of a design matrix, with some accuracy. Such an algorithm should identify a\nhigh proportion of RIP matrices generated through random design and make no false positive claims.\nWe call such an algorithm an average-case certi\ufb01er, or a certi\ufb01er for short.\nDe\ufb01nition (Certi\ufb01er). Given a parameter sequence (p, k, \u03b8) = (pn, kn, \u03b8n), we de\ufb01ne a certi\ufb01er for\n\u02dcQ\u2297(n\u00d7p)-random matrices to be a sequence (\u03c8n)n of measurable functions \u03c8n : Rn\u00d7p \u2192 {0, 1},\nsuch that\n\n\u02dcQ\u2297(n\u00d7p)(cid:0)\u03c8\u22121\n\nn (0)(cid:1) \u2264 1/3.\n\nn (1) \u2286 RIPn,p(k, \u03b8)\n\u03c8\u22121\n\n(2)\n\nand\n\nlim sup\nn\u2192\u221e\n\n4\n\n\fNote the de\ufb01nition of a certi\ufb01er depends on both the asymptotic parameter sequence (pn, kn, \u03b8n) and\nthe sub-Gaussian distribution Q. However, when it is clear from the context, we will suppress the\ndependence and refer to certi\ufb01ers for RIPn,p(k, \u03b8) properties of \u02dcQ\u2297(n\u00d7p)-random matrices simply\nas \u2018certi\ufb01ers\u2019.\nThe two de\ufb01ning properties in (2) can be understood as follows. The \ufb01rst condition means that if a\ncerti\ufb01er outputs 1, we know with certainty that the matrix is RIP. The second condition means that\nthe certi\ufb01er is not overly conservative; it is allowed to output 0 for at most one third (with respect\nto \u02dcQ\u2297(n\u00d7p) measure) of the matrices. The choice of 1/3 in the de\ufb01nition of a certi\ufb01er is made to\nsimplify proofs. However, all subsequent results will still hold if we replace 1/3 by any constant in\n(0, 1). In view of Proposition 1, the second condition in (2) can be equivalently stated as\n\n\u02dcQ\u2297(n\u00d7p)(cid:8)\u03c8n(X) = 1(cid:12)(cid:12) X \u2208 RIPn,p(k, \u03b8)(cid:9) \u2265 2/3.\n\nlim inf\nn\u2192\u221e\n\nWith such a certi\ufb01er, given an arbitrary problem \ufb01tting the sparse linear model, the matrix X could\nbe tested for the restricted isometry property, with some expectation of a positive result. This would\nbe particularly interesting given a certi\ufb01er in the parameter regime n (cid:28) \u03b82\nn, in which presently\nknown polynomial-time certi\ufb01ers cannot give positive results.\nEven though it is not the main focus of our paper, we also note that a certi\ufb01er \u03c8 with the above\nproperties for some distribution Q \u2208 Q would form a certi\ufb01er/distribution couple (\u03c8, Q), that yields\nin the usual manner a Las Vegas algorithm to generate RIP matrices. The (random) algorithm keeps\ngenerating random matrices X \u223c \u02dcQ\u2297(n\u00d7p) until \u03c8n(X) = 1. The number of times that the certi\ufb01er\n\nis invoked has a geometric distribution with success probability \u02dcQ\u2297(n\u00d7p)(cid:0)\u03c8\u22121\n\nn (1)(cid:1). Hence, the\n\nLas Vegas algorithm runs in randomized polynomial time if and only if \u03c8n runs in randomized\npolynomial time.\n\nnk2\n\n2.2 Certi\ufb01er properties\n\nAlthough our focus is on algorithmically ef\ufb01cient certi\ufb01ers, we establish \ufb01rst the properties of a\ncerti\ufb01er that is computationally intractable. This certi\ufb01er serves as a benchmark for the performance\nof other candidates. Indeed, we exhibit in the following proposition a certi\ufb01er, based on the k-sparse\noperator norm, that works uniformly well in the same asymptotic parameter regime R0, where\n\u02dcQ\u2297(n\u00d7p)-random matrices are RIP with asymptotic probability 1. For clarity, we stress that our\ncriterion when judging a certi\ufb01er will always be its uniform performance over asymptotic regimes\nR\u03b1 for some \u03b1 \u2208 [0, 1].\nProposition 2. Suppose (p, k, \u03b8) = (pn, kn, \u03b8n) \u2208 R0. Furthermore, Let Q \u2208 Q and X \u223c\n\u02dcQ\u2297(n\u00d7p). Then the sequence of tests (\u03c8op,k)n based on sparse operator norms, de\ufb01ned by\n\n(cid:26)\n\n(cid:27)\n\n(cid:107)X(cid:62)X \u2212 Ip(cid:107)op,k \u2264 \u03b8\n\n.\n\n\u03c8op,k(X) := 1\n\nis a certi\ufb01er for \u02dcQ\u2297(n\u00d7p)-random matrices.\n\nBy a direct reduction from the clique problem, one can show that it is NP-hard to compute the k-\nsparse operator norm of a matrix. Hence the certi\ufb01er \u03c8op,k is computationally intractable. The next\nproposition concerns the certi\ufb01er property of a test based on the maximum incoherence between\ncolumns of the design matrix.\nIt follows directly from a well-known result on the incoherence\nparameter of a random matrix (see, e.g. Rauhut and Foucart (2013, Proposition 6.2)) and allows the\nconstruction of a polynomial-time certi\ufb01er that works uniformly well in the asymptotic parameter\nregime R1.\nProposition 3. Suppose (p, k, \u03b8) = (pn, kn, \u03b8n) satis\ufb01es n \u2265 196\u03c34k2 log(p)/\u03b82. Let Q \u2208 Q and\nX \u223c \u02dcQ\u2297(n\u00d7p), then the tests \u03c8\u221e de\ufb01ned by\n\n(cid:26)\n\n\u03c8\u221e(X) := 1\n\n(cid:107)X(cid:62)X \u2212 Ip(cid:107)\u221e \u2264 14\u03c32\n\n(cid:114)\n\n(cid:27)\n\nlog(p)\n\nn\n\nis a certi\ufb01er for \u02dcQ\u2297(n\u00d7p)-random matrices.\n\n5\n\n\fProposition 3 shows that, when the sample size n is above k2 log(p)/\u03b82 in magnitude (in particular,\nthis is satis\ufb01ed asymptotically when (p, k, \u03b8) = (pn, kn, \u03b8n) \u2208 R1), there is a polynomial time\ncerti\ufb01er.\nIn other words, in this high-signal regime, the average-case decision problem for RIP\nproperty is much more tractable than indicated by the worst-case result. On the other hand, the\ncerti\ufb01er in Proposition 3 works in a much smaller parameter range when compared to \u03c8op,k in\nProposition 2. Combining Proposition 2 and 3, we have the following schematic diagram (Figure 1).\nWhen the sample size is lower than speci\ufb01ed in R0, the property does not hold, with high probability,\nand no certi\ufb01er exists. A computationally intractable certi\ufb01er works uniformly over R0. On the other\nend of the spectrum, when the sample size is large enough to be in R1, a simple certi\ufb01er based on\nthe maximum incoherence of the design matrix is known to work in polynomial time. This leaves\nopen the question of whether (randomized) polynomial time certi\ufb01ers can work uniformly well in\nR0, or R\u03b1 for any \u03b1 \u2208 [0, 1). We will see in the next section that, assuming a weaker variant of\nthe Planted Clique hypothesis from computational complexity theory, R1 is essentially the largest\nasymptotic regime where a randomized polynomial time certi\ufb01er can exist.\n\nFigure 1: Schematic digram for existence of certi\ufb01ers in different asymptotic regimes.\n\n3 Hardness of Certi\ufb01cation\n\n3.1 Planted dense subgraph assumptions\n\nWe show in this section that certi\ufb01cation of RIP property is an average-case hard problem in the\nparameter regime R\u03b1 for any \u03b1 < 1. This is precisely the regime not covered by Proposition 3. The\naverage-case hardness result is proved via reduction to the planted dense subgraph assumption.\nFor any integer m \u2265 0, denote Gm the collection of all graphs on m vertices. We write V (G)\nand E(G) for the set of vertices and edges of a graph G. For H \u2208 G\u03ba where \u03ba \u2208 {0, . . . , m}, let\nG(m, 1/2, H) be the random graph model that generates a random graph G on m vertices as follows.\nIt \ufb01rst picks \u03ba random vertices K \u2286 V (G) and plants an isomorphic copy of H on these \u03ba vertices,\nthen every pair of vertices not in K \u00d7 K is connected by an edge independently with probability\n1/2. We write PH for the probability measure on Gm associated with G(m, 1/2, H). Note that if H\nis the empty graph, then G(m, 1/2,\u2205) describes the Erd\u02ddos\u2013R\u00e9nyi random graph. With a slight abuse\nof notation, we write P0 in place of P\u2205. On the other hand, for \u0001 \u2208 (0, 1/2], if H belongs to the set\n\n(cid:26)\n\n(cid:27)\n\n,\n\n\u03ba(\u03ba \u2212 1)\n\n2\n\nH = H\u03ba,\u0001 :=\n\nH \u2208 G\u03ba : #E(H) \u2265 (1/2 + \u0001)\n\nand\n\nthen G(m, 1/2, H) generates random graphs that contain elevated local edge density. The planted\ndense graph problem concerns testing apart the following two hypotheses:\n\nH0 : G \u223c G(m, 1/2,\u2205)\n\nH1 : G \u223c G(m, 1/2, H) for some H \u2208 H\u03ba,\u0001.\n\n(3)\nIt is widely believed that for \u03ba = O(m1/2\u2212\u03b4), there does not exist randomized polynomial time\ntests to distinguish between H0 and H1 (see, e.g. Jerrum (1992); Feige and Krauthgamer (2003);\nFeldman et al. (2013)). More precisely, we have the following assumption.\nAssumption (A1) 1. Fix \u0001 \u2208 (0, 1/2] and \u03b4 \u2208 (0, 1/2).\n(\u03c6m : Gm \u2192 {0, 1})m, we have\nP0\n\nsuch that \u03bam \u2192 \u221e and \u03bam = O(cid:0)m1/2\u2212\u03b4(cid:1). For any sequence of randomized polynomial time tests\n\n(cid:0)\u03c6(G) = 1(cid:1) + max\n\n(cid:0)\u03c6(G) = 0)(cid:1)(cid:111)\n\nlet (\u03bam)m be any sequence of integers\n\n> 1/3 .\n\n(cid:110)\n\nlim inf\n\nm\n\nH\u2208H\u03ba,\u0001\n\nPH\n\n6\n\n\fWe remark that if \u0001 = 1/2, then H\u03ba,\u0001 contains only the \u03ba-complete graph and the testing problem\nbecomes the well-known planted clique problem (cf. Jerrum (1992) and references in Berthet and\nRigollet (2013a,b)).\nThe dif\ufb01culty of this problem has been used as a primitive for the hardness of other tasks, such\nas cryptographic applications, in Juels and Peinado (2000), testing for k-wise dependence in Alon\net al. (2007), approximating Nash equilibria in Hazan and Krauthgamer (2011). In this case, As-\nsumption (A1) is a version of the planted clique hypothesis (see, e.g. Berthet and Rigollet (2013b,\nAssumption APC)). We emphasize that Assumption A1 is signi\ufb01cantly milder than the planted\nclique hypothesis (since it allows any \u0001 \u2208 (0, 1/2]), or that a hypothesis on planted random graphs.\nWe also note that when \u03ba \u2265 C\u0001\nm, spectral methods can be used to detect such graphs with\nhigh probability. Indeed, when G contains a graph of H, denoting AG its adjacency matrix, then\nAG \u2212 11(cid:62)/2 has a leading eigenvalue greater than \u0001(\u03ba \u2212 1), whereas it is of order\nm for a usual\nErd\u02ddos\u2013R\u00e9nyi random graph.\nThe following theorem relates the hardness of the planted dense subgraph testing problem to the\nhardness of certifying restricted isometry of random matrices. We recall that the distribution of X is\n\u221a\nthat of an n\u00d7p random matrix with entries independently and identically sampled from \u02dcQ d= Q/\nn,\nfor some Q \u2208 Q. We also write \u03a8rp for the class of randomized polynomial time certi\ufb01ers.\n\n\u221a\n\n\u221a\n\nTheorem 4. Assume (A1) and \ufb01x any \u03b1 \u2208 [0, 1). Then there exists a sequence (p, k, \u03b8) =\n(pn, kn, \u03b8n) \u2208 R\u03b1, such that there is no certi\ufb01er/distribution couple (\u03c8, Q) \u2208 \u03a8rp \u00d7Q with respect\nto this sequence of parameters.\n\nOur proof of Theorem 4 relies on the following ideas: Given a graph G, an instance of the planted\nclique problem in the assumed hard regime, we construct n random vectors based on the adjacency\nmatrix of a bipartite subgraph of G, between two random sets of vertices. Each coef\ufb01cient of these\nvectors is then randomly drawn from one of two carefully chosen distributions, conditionally on the\npresence or absence of a particular edge. This construction ensures that if the graph is an Erd\u02ddos\u2013\nR\u00e9nyi random graph (i.e. with no planted graph), the vectors are independent with independent\ncoef\ufb01cients, with distribution \u02dcQ. Otherwise, we show that with high probability, the presence of an\nunusually dense subgraph will make it very likely that the matrix does not satisfy the restricted isom-\netry property, for a set of parameters in R\u03b1. As a consequence, if there existed a certi\ufb01er/distribution\ncouple (\u03c8, Q) \u2208 \u03a8rp \u00d7 Q in this range of parameters, it could be used - by using as input in the\ncerti\ufb01er the newly constructed matrix - to determine with high probability the distribution of G,\nviolating our assumption (A1).\nWe remark that this result holds for any distribution in Q, in contrast to computational lower bounds\nin statistical learning problems, that apply to a speci\ufb01c distribution. For the sake of simplicity, we\nhave kept the coef\ufb01cients of X identically distributed, but our analysis is not dependent on that\nfact, and our result can be directly extended to the case where the coef\ufb01cients are independent, with\ndifferent distributions in Q.\nTheorem 4 may be viewed as providing an asymptotic lower bound of the sample size n for the\nexistence of a computationally feasible certi\ufb01er. It establishes this computational lower bound by\nexhibiting some speci\ufb01c \u2018hard\u2019 sequences of parameters inside R\u03b1, and show that any algorithm\nviolating the computational lower bound could be exploited to solve the planted dense subgraph\nproblem. All hardness results, whether in a worst-case (NP-hardness, or other) or the average-case\n(by reduction from a hard problem), are by nature statements on the impossibility of accomplishing a\ntask in a computationally ef\ufb01cient manner, uniformly over a range of parameters. They are therefore\nalways based on the construction of a \u2018hard\u2019 sequence of parameters used in the reduction, for\nwhich a contradiction is shown. Here, the \u2018hard\u2019 sequence is explicitly constructed in the proof\nto be some (p, k, \u03b8) = (pn, kn, \u03b8n) satisfying p \u2265 n and n1/(3\u2212\u03b1\u22124\u03b2) (cid:28) k (cid:28) n1/(2\u2212\u03b2)\u2212\u03b4, for\n\u03b2 \u2208 [0, (1 \u2212 \u03b1)/3) and any small \u03b4 > 0. The tuning parameter \u03b2 is to allow additional \ufb02exibility\nin choosing these \u2018hard\u2019 sequences. More precisely, using an averaging trick \ufb01rst seen in Ma and\nWu (2013), we are able to show that the existence of such \u2018hard\u2019 sequences is not con\ufb01ned only in\nthe sparsity regime k (cid:28) n1/2 . We note that in all our \u2018hard\u2019 sequences, \u03b8n must depend on n. An\ninteresting extension is to see if similar computational lower bounds hold when restricted to a subset\nof R\u03b1 where \u03b8 is constant.\n\n7\n\n\fReferences\nAlon, N., Andoni, A., Kaufman, T., Matulef, K., Rubinfeld, R., and Xie, N. (2007) Testing k-wise\n\nand almost k-wise independence. Proceedings of the Thirty-ninth ACM STOC. 496\u2013505.\n\nArias-Castro, E., Verzelen, N. (2013) Community Detection in Dense Random Networks. Ann.\n\nStatist.,42, 940-969\n\nAwasthi, P., Charikar, M., Lai, K. A. and Risteki, A. (2015) Label optimal regret bounds for online\n\nlocal learning. J. Mach. Learn. Res. (COLT), 40.\n\nBandeira, A. S., Dobriban, E., Mixon, D. G. and Sawin, W. F. (2012) Certifying the restricted\n\nisometry property is hard. IEEE Trans. Information Theory, 59, 3448\u20133450.\n\nBandeira, A. S., Mixon, D. G. and Moreira, J. (2014) A conditional construction of restricted isome-\n\ntries. International Mathematics Research Notices, to appear.\n\nBaraniuk, R., Davenport, M., DeVore, R. and Wakin, M. (2008) A simple proof of the restricted\n\nisometry property for random matrices. Constructive Approximation, 28, 253\u2013263.\n\nBerthet, Q. and Ellenberg, J. S. (2015) Detection of Planted Solutions for Flat Satis\ufb01ability Prob-\n\nlems. Preprint\n\nBerthet, Q. and Rigollet P. (2013) Optimal detection of sparse principal components in high dimen-\n\nsion. Ann. Statist., 41, 1780\u20131815.\n\nBerthet, Q. and Rigollet P. (2013) Complexity theoretic lower bounds for sparse principal component\n\ndetection. J. Mach. Learn. Res. (COLT), 30, 1046\u20131066.\n\nBhaskara, A., Charikar, M., Chlamtac, E., Feige, U. and Vijayaraghavan, A. (2010) Detecting High\nLog-Densities an O(n1/4) Approximation for Densest k-Subgraph. Proceedings of the forty-\nsecond ACM symposium on Theory of computing, 201\u2013210.\n\nBickel, P., Ritov, Y. and Tsybakov, A. (2009) Simultaneous analysis of Lasso and Dantzig selector\n\nAnn. Statist., 37,1705\u20131732\n\nBlum, A., Kalai, A. and Wasserman, H. (2003) Noise-tolerant learning, the parity problem, and the\n\nstatistical query model. Journal of the ACM, 50, 506\u2013519.\n\nBlumensath, T. and Davies, M. E. (2009) Iterative hard thresholding for compressed sensing. Applied\n\nand Computational Harmonic Analysis, 27, 265\u2013274.\n\nBourgain, J., Dilworth, S., Ford, K. and Konyagin, S. (2011) Explicit constructions of RIP matrices\n\nand related problems. Duke Math. J., 159, 145\u2013185.\n\nCand\u00e8s, E. J. (2008) The restricted isometry property and its implications for compressed sensing.\n\nComptes Rendus Mathematique, 346, 589\u2013592.\n\nCand\u00e8s, E. J., Romberg, J. and Tao, T. (2006) Robust uncertainty principles: Exact signal reconstruc-\n\ntion from highly incomplete frequency information. IEEE Trans. Inform. Theory, 52, 489\u2013509.\n\nCand\u00e8s, E. J., Romberg, J. K. and Tao, T. (2006) Stable signal recovery from incomplete and inac-\n\ncurate measurements. Communications on pure and applied mathematics, 59, 2006.\n\nCand\u00e8s E. J. and Tao, T. (2005) Decoding by Linear Programming. IEEE Trans. Inform. Theory, 51,\n\n4203\u20134215.\n\nChen, Y. and Xu, J. (2014) Statistical-computational tradeoffs in planted problems and submatrix\n\nlocalization with a growing number of clusters and submatrices. preprint, arXiv:1402.1267.\n\nd\u2019Aspremont, A., Bach, F. and El Ghaoui, L. (2008) Optimal solutions for sparse principal compo-\n\nnent analysis. J. Mach. Learn. Res., 9, 1269\u20131294.\n\nd\u2019Aspremont, A. and El Ghaoui, L. (2011) Testing the nullspace property using semide\ufb01nite pro-\n\ngramming. Mathematical programming, 127, 123\u2013144.\n\n8\n\n\fDai, W. and Milenkovic, O. (2009) Subspace pursuit for compressive sensing signal reconstruction.\n\nIEEE Trans. Inform. Theory, 55, 2230\u20132249.\n\nDonoho, D. L. (2006) Compressed sensing. IEEE Trans. Inform. Theory, 52, 1289\u20131306.\nDonoho, D. L., and Elad, M. (2003) mally sparse representation in general (nonorthogonal) dictio-\nnaries via (cid:96)1 minimization. Proceedings of the National Academy of Sciences, 100, 2197\u20132202.\nDonoho, D. L., Elad, M. and Temlyakov, V. N. (2006) Stable recovery of sparse overcomplete\n\nrepresentations in the presence of noise. IEEE Trans. Inform. Theory, 52, 6\u201318.\n\nEldar, Y. C. and Kutyniok, G. (2012) Compressed Sensing: Theory and Applications. Cambridge\n\nUniversity Press, Cambridge.\n\nFeige, U. Relations between average case complexity and approximation complexity. Proceedings\n\nof the Thirty-Fourth Annual ACM Symposium on Theory of Computing, 534\u2013543.\n\nFeige, U. and Krauthgamer, R. (2003) The probable value of the Lov\u00e0sz\u2013Schrijver relaxations for a\n\nmaximum independent set. SIAM J. Comput., 32, 345\u2013370.\n\nFeldman, V., Grigorescu, E., Reyzin, L., Vempala, S. S. and Xiao, Y. (2013) Statistical Algorithms\nand a Lower Bound for Detecting Planted Cliques. Proceedings of the Forty-\ufb01fth Annual ACM\nSymposium on Theory of Computing. 655\u2013664.\n\nGao, C., Ma, Z. and Zhou, H. H. (2014) Sparse CCA: adaptive estimation and computational barri-\n\ners. preprint, arXiv:1409.8565.\n\nHajek, B., Wu, Y. and Xu, J.(2015) Computational Lower Bounds for Community Detection on\n\nRandom Graphs, Proceedings of The 28th Conference on Learning Theory, 899\u2013928.\n\nHazan, E. and Krauthgamer, R. (2011) How hard is it to approximate the best nash equilibrium?\n\nSIAM J. Comput., 40, 79\u201391.\n\nJerrum, M. (1992) Large cliques elude the Metropolis process. Random Struct. Algor., 3, 347\u2013359.\nJuditsky, A. and Nemirovski, A. (2011) On veri\ufb01able suf\ufb01cient conditions for sparse signal recovery\n\nvia (cid:96)1 minimization. Mathematical programming, 127, 57\u201388.\n\nJuels, A. and Peinado, M. (2000) Hiding cliques for cryptographic security. Des. Codes Cryptogra-\n\nphy. 20, 269-280.\n\nKoiran, P. and Zouzias, A. (2012) Hidden cliques and the certi\ufb01cation of the restricted isometry\n\nproperty. preprint, arXiv:1211.0665.\n\nLee, K. and Bresler, Y. (2008) Computing performance guarantees for compressed sensing. IEEE\n\nInternational Conference on Acoustics, Speech and Signal Processing, 5129\u20135132.\n\nMa, Z. and Wu, Y. (2013) Computational barriers in minimax submatrix detection. arXiv preprint.\nMallat, S. (1999) A wavelet tour of signal processing. Academic press, Cambridge, MA.\nNeedell, D. and Tropp, J. A. (2009) CoSaMP: Iterative signal recovery from incomplete and inac-\n\ncurate samples. Applied and Computational Harmonic Analysis, 26, 301\u2013321.\n\nRauhut, H. and Foucart, S. (2013) A Mathematical Introduction to Compressive Sensing. Birkh\u00e4user.\nTillmann, A. N. and Pfetsch M. E. (2014) The computational complexity of the restricted isometry\nproperty, the nullspace property, and related concepts in compressed sensing. IEEE Trans. Inform.\nTheory, 60, 1248\u20131259.\n\nvan de Geer, S. and Buhlmann, P. (2009) On the conditions used to prove oracle results for the lasso.\n\nElectron. J. Stat., 3, 1360\u20131392\n\nWang, T., Berthet, Q. and Samworth, R. J. (2016) Statistical and computational trade-offs in Esti-\n\nmation of Sparse Pincipal Components. Ann. Statist., 45, 1896\u20131930\n\nZhang, Y., Wainwright, M. J. and Jordan, M. I. (2014) Lower bounds on the performance of\npolynomial-time algorithms for sparse linear regression. JMLR: Workshop and Conference Pro-\nceedings (COLT), 35, 921\u2013948.\n\n9\n\n\f", "award": [], "sourceid": 1901, "authors": [{"given_name": "Tengyao", "family_name": "Wang", "institution": "University of Cambridge"}, {"given_name": "Quentin", "family_name": "Berthet", "institution": "University of Cambridge"}, {"given_name": "Yaniv", "family_name": "Plan", "institution": "University of British Columbia"}]}