{"title": "Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula", "book": "Advances in Neural Information Processing Systems", "page_first": 424, "page_last": 432, "abstract": "Factorizing low-rank matrices has many applications in machine learning and statistics. For probabilistic models in the Bayes optimal setting, a general expression for the mutual information has been proposed using heuristic statistical physics computations, and proven in few specific cases. Here, we show how to rigorously prove the conjectured formula for the symmetric rank-one case. This allows to express the minimal mean-square-error and to characterize the detectability phase transitions in a large set of estimation problems ranging from community detection to sparse PCA. We also show that for a large set of parameters, an iterative algorithm called approximate message-passing is Bayes optimal. There exists, however, a gap between what currently known polynomial algorithms can do and what is expected information theoretically. Additionally, the proof technique has an interest of its own and exploits three essential ingredients: the interpolation method introduced in statistical physics by Guerra, the analysis of the approximate message-passing algorithm and the theory of spatial coupling and threshold saturation in coding. Our approach is generic and applicable to other open problems in statistical estimation where heuristic statistical physics predictions are available.", "full_text": "Mutual information for symmetric rank-one matrix\n\nestimation: A proof of the replica formula\n\nLaboratoire de Th\u00e9orie des Communications, Facult\u00e9 Informatique et Communications,\n\nJean Barbier, Mohamad Dia and Nicolas Macris\n\nEcole Polytechnique F\u00e9d\u00e9rale de Lausanne, 1015, Suisse.\n\nfirstname.lastname@epfl.ch\n\nLaboratoire de Physique Statistique, CNRS, PSL Universit\u00e9s et Ecole Normale Sup\u00e9rieure,\n\nSorbonne Universit\u00e9s et Universit\u00e9 Pierre & Marie Curie, 75005, Paris, France.\n\nflorent.krzakala@ens.fr\n\nFlorent Krzakala\n\nInstitut de Physique Th\u00e9orique, CNRS, CEA, Universit\u00e9 Paris-Saclay, F-91191, Gif-sur-Yvette, France.\n\nlesieur.thibault@gmail.com,lenka.zdeborova@gmail.com\n\nThibault Lesieur and Lenka Zdeborov\u00e1\n\nAbstract\n\nFactorizing low-rank matrices has many applications in machine learning and statis-\ntics. For probabilistic models in the Bayes optimal setting, a general expression\nfor the mutual information has been proposed using heuristic statistical physics\ncomputations, and proven in few speci\ufb01c cases. Here, we show how to rigorously\nprove the conjectured formula for the symmetric rank-one case. This allows to\nexpress the minimal mean-square-error and to characterize the detectability phase\ntransitions in a large set of estimation problems ranging from community detec-\ntion to sparse PCA. We also show that for a large set of parameters, an iterative\nalgorithm called approximate message-passing is Bayes optimal. There exists,\nhowever, a gap between what currently known polynomial algorithms can do and\nwhat is expected information theoretically. Additionally, the proof technique has\nan interest of its own and exploits three essential ingredients: the interpolation\nmethod introduced in statistical physics by Guerra, the analysis of the approximate\nmessage-passing algorithm and the theory of spatial coupling and threshold satura-\ntion in coding. Our approach is generic and applicable to other open problems in\nstatistical estimation where heuristic statistical physics predictions are available.\n\nConsider the following probabilistic rank-one matrix estimation problem: one has access to\nnoisy observations w = (wij)n\ni,j=1 of the pair-wise product of the components of a vector\n(cid:124) \u2208 Rn with i.i.d components distributed as Si \u223c P0, i = 1, . . . , n. The entries\ns = (s1, . . . , sn)\n\u221a\nof w are observed through a noisy element-wise (possibly non-linear) output probabilistic channel\nPout(wij|sisj/\nn). The goal is to estimate the vector s from w assuming that both P0 and Pout are\nknown and independent of n (noise is symmetric so that wij = wji). Many important problems in\nstatistics and machine learning can be expressed in this way, such as sparse PCA [1], the Wigner\nspiked model [2, 3], community detection [4] or matrix completion [5].\nProving a result initially derived by a heuristic method from statistical physics, we give an explicit\nexpression for the mutual information (MI) and the information theoretic minimal mean-square-error\n(MMSE) in the asymptotic n \u2192 \u221e limit. Our results imply that for a large region of parameters,\nthe posterior marginal expectations of the underlying signal components (often assumed intractable\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fto compute) can be obtained in the leading order in n using a polynomial-time algorithm called\napproximate message-passing (AMP) [6, 3, 4, 7]. We also demonstrate the existence of a region\nwhere both AMP and spectral methods [8] fail to provide a good answer to the estimation problem,\nwhile it is nevertheless information theoretically possible to do so. We illustrate our theorems with\nexamples and also brie\ufb02y discuss the implications in terms of computational complexity.\n\n1 Setting and main results\n\n4\u2206\n\n\u2212 x2\n\nS\n\n\u03a3(E;\u2206)\n\n,\n\n(cid:20)\n\n(cid:18)(cid:90)\n\nln\n\ndx P0(x)e\n\n\u221a\n\ni,j=1\n\nn+zij\n\n\u2206, where z = (zij)n\n\nThe additive white Gaussian noise setting: A standard and natural setting is the case of additive\n\u221a\nwhite Gaussian noise (AWGN) of known variance \u2206, wij = sisj/\nis a symmetric matrix with i.i.d entries Zij \u223cN (0, 1), 1\u2264 i\u2264 j \u2264 n. Perhaps surprisingly, it turns\nout that this Gaussian setting is suf\ufb01cient to completely characterize all the problems discussed in\nthe introduction, even if these have more complicated output channels. This is made possible by a\ntheorem of channel universality [9] (already proven for community detection in [4] and conjectured in\n[10]). This theorem states that given an output channel Pout(w|y), such that (s.t) log Pout(w|y = 0) is\nthree times differentiable with bounded second and third derivatives, then the MI satis\ufb01es I(S; W) =\nI(S; SS(cid:124)\nn), where \u2206 is the inverse Fisher information (evaluated at y = 0) of\nthe output channel: \u2206\u22121 := EPout(w|0)[(\u2202y log Pout(W|y)|y=0)2]. Informally, this means that we\nonly have to compute the MI for an AWGN channel to take care of a wide range of problems, which\ncan be expressed in terms of their Fisher information. In this paper we derive rigorously, for a large\nclass of signal distributions P0, an explicit one-letter formula for the MI per variable I(S; W)/n in\nthe asymptotic limit n\u2192\u221e.\nMain result: Our central result is a proof of the expression for the asymptotic n\u2192\u221e MI per variable\nvia the so-called replica symmetric (RS) potential iRS(E; \u2206) de\ufb01ned as\n\n\u221a\n\u2206)+O(\n\n\u221a\nn+Z\n\n\u221a\n/\n\n\u2212 ES,Z\n\n\u03a3(E;\u2206)2 + Z\n\n(cid:1)(cid:19)(cid:21)\n\niRS(E; \u2206) :=\n\n(v \u2212 E)2 + v2\n\n2\u03a3(E;\u2206)2 +x(cid:0)\nthat P0 is a discrete distribution over a \ufb01nite bounded real alphabet P0(s) =(cid:80)\u03bd\n\n(1)\nwith Z \u223cN (0, 1), S \u223c P0, E[S2] = v and \u03a3(E; \u2206)2 := \u2206/(v\u2212E), E \u2208 [0, v]. Here we will assume\n\u03b1=1 p\u03b1\u03b4(s\u2212a\u03b1). Thus\nthe only continuous integral in (1) is the Gaussian over z. Our results can be extended to mixtures of\ndiscrete and continuous signal distributions at the expense of technical complications in some proofs.\nIt turns out that both the information theoretic and algorithmic AMP thresholds are determined by the\nset of stationary points of (1) (w.r.t E). It is possible to show that for all \u2206 > 0 there always exist at\nleast one stationary minimum. Note E = 0 is never a stationary point (except for P0 a single Dirac\nmass) and E = v is stationary only if E[S] = 0. In this contribution we suppose that at most three\nstationary points exist, corresponding to situations with at most one phase transition. We believe that\nsituations with multiple transitions can also be covered by our techniques.\nTheorem 1.1 (RS formula for the mutual information) Fix \u2206 > 0 and let P0 be a discrete distri-\nbution s.t (1) has at most three stationary points. Then limn\u2192\u221e I(S; W)/n = minE\u2208[0,v] iRS(E; \u2206).\nThe proof of the existence of the limit does not require the above hypothesis on P0. Also, it was \ufb01rst\nshown in [9] that for all n, I(S; W)/n\u2264 minE\u2208[0,v] iRS(E; \u2206), an inequality that we will use in the\nproof section. It is conceptually useful to de\ufb01ne the following threshold:\nDe\ufb01nition 1.2 (Information theoretic threshold) De\ufb01ne \u2206Opt as the \ufb01rst non-analyticity point of\nthe MI as \u2206 increases: \u2206Opt := sup{\u2206| limn\u2192\u221e I(S; W)/n is analytic in ]0, \u2206[}.\nWhen P0 is s.t (1) has at most three stationary points, as discussed below, then minE\u2208[0,v] iRS(E; \u2206)\nhas at most one non-analyticity point denoted \u2206RS (if minE\u2208[0,v] iRS(E; \u2206) is analytic over all R+\nwe set \u2206RS = \u221e). Theorem 1.1 gives us a mean to compute the information theoretic threshold\n\u2206Opt = \u2206RS. A basic application of theorem 1.1 is the expression of the MMSE:\nCorollary 1.3 (Exact formula for the MMSE) For all \u2206 (cid:54)= \u2206RS, the matrix-MMSE Mmmsen :=\nES,W[(cid:107)SS(cid:124) \u2212 E[XX(cid:124)|W](cid:107)2\nis asymptotically\nlimn\u2192\u221e Mmmsen(\u2206\u22121) = v2\u2212(v\u2212argminE\u2208[0,v]iRS(E; \u2206))2. Moreover, if \u2206 < \u2206AMP (where\n\u2206AMP is the algorithmic threshold, see de\ufb01nition 1.4) or \u2206 > \u2206RS, then the usual vector-MMSE\nVmmsen :=ES,W[(cid:107)S\u2212E[X|W](cid:107)2\n\n((cid:107) \u2212 (cid:107)F being the Frobenius norm)\n\n2]/n satis\ufb01es limn\u2192\u221e Vmmsen =argminE\u2208[0,v]iRS(E; \u2206).\n\nF]/n2\n\n2\n\n\fIt is natural to conjecture that the vector-MMSE is given by argminE\u2208[0,v]iRS(E; \u2206) for all \u2206(cid:54)= \u2206RS,\nbut our proof does not quite yield the full statement.\nA fundamental consequence concerns the performance of the AMP algorithm [6] for estimating s.\nAMP has been analysed rigorously in [11, 12, 4] where it is shown that its asymptotic performance\nis tracked by state evolution (SE). Let Et := limn\u2192\u221e ES,Z[(cid:107)S\u2212\u02c6st(cid:107)2\n2]/n be the asymptotic average\nvector-MSE of the AMP estimate \u02c6st at time t. De\ufb01ne mmse(\u03a3\u22122) :=ES,Z[(S\u2212E[X|S +\u03a3Z])2] as\nthe usual scalar mmse function associated to a scalar AWGN channel of noise variance \u03a32, with\nS\u223c P0 and Z \u223cN (0, 1). Then\n\nEt+1 = mmse(\u03a3(Et; \u2206)\u22122),\n\nE0 = v,\n\n(2)\nis the SE recursion. Monotonicity properties of the mmse function imply that Et is a decreasing\nsequence s.t limt\u2192\u221e Et = E\u221e exists. Note that when E[S] = 0 and v is an unstable \ufb01xed point, as\nsuch, SE \u201cdoes not start\u201d. While this is not really a problem when one runs AMP in practice, for\nanalysis purposes one can slightly bias P0 and remove the bias at the end of the proofs.\nDe\ufb01nition 1.4 (AMP algorithmic threshold) For \u2206 > 0 small enough, the \ufb01xed point equation\ncorresponding to (2) has a unique solution for all noise values in ]0, \u2206[. We de\ufb01ne \u2206AMP as the\nsupremum of all such \u2206.\nCorollary 1.5 (Performance of AMP) In the limit n\u2192\u221e, AMP initialized without any knowledge\nother than P0 yields upon convergence the asymptotic matrix-MMSE as well as the asymptotic\nvector-MMSE iff \u2206 < \u2206AMP or \u2206 > \u2206RS, namely E\u221e =argminE\u2208[0,v]iRS(E; \u2206).\n\u2206AMP can be read off the replica potential (1): by differentiation of (1) one \ufb01nds a \ufb01xed point\nequation that corresponds to (2). Thus \u2206AMP is the smallest solution of \u2202iRS/\u2202E = \u22022iRS/\u2202E2 = 0;\nin other words it is the \u201c\ufb01rst\u201d horizontal in\ufb02exion point appearing in iRS(E; \u2206) when \u2206 increases.\nDiscussion: With our hypothesis on P0 there are only three possible scenarios: \u2206AMP < \u2206RS\n(one \u201c\ufb01rst order\u201d phase transition); \u2206AMP = \u2206RS < \u221e (one \u201chigher order\u201d phase transition);\n\u2206AMP = \u2206RS =\u221e (no phase transition). In the sequel we will have in mind the most interesting\ncase, namely one \ufb01rst order phase transition, where we determine the gap between the algorithmic\nAMP and information theoretic performance. The cases of no phase transition or higher order phase\ntransition, which present no algorithmic gap, are basically covered by the analysis of [3] and follow\nas a special case from our proof. The only cases that would require more work are those where P0 is\ns.t (1) develops more than three stationary points and more than one phase transition is present.\nFor \u2206AMP < \u2206RS the structure of stationary points of (1) is as follows1 (\ufb01gure 1). There exist three\nbranches Egood(\u2206), Eunstable(\u2206) and Ebad(\u2206) s.t: 1) For 0 < \u2206 < \u2206AMP there is a single stationary\npoint Egood(\u2206) which is a global minimum; 2) At \u2206AMP a horizontal in\ufb02exion point appears, for\n\u2206\u2208 [\u2206AMP, \u2206RS] there are three stationary points satisfying Egood(\u2206AMP) < Eunstable(\u2206AMP) =\nEbad(\u2206AMP), Egood(\u2206) < Eunstable(\u2206) < Ebad(\u2206) otherwise, and moreover iRS(Egood; \u2206) \u2264\niRS(Ebad; \u2206) with equality only at \u2206RS; 3) for \u2206 > \u2206RS there is at least the stationary point\nEbad(\u2206) which is always the global minimum, i.e. iRS(Ebad; \u2206) < iRS(Egood; \u2206). (For higher \u2206\nthe Egood(\u2206) and Eunstable(\u2206) branches may merge and disappear); 4) Egood(\u2206) is analytic for\n\u2206\u2208]0, \u2206(cid:48)[, \u2206(cid:48) > \u2206RS, and Ebad(\u2206) is analytic for \u2206 > \u2206AMP.\nWe note for further use in the proof section that E\u221e = Egood(\u2206) for \u2206 < \u2206AMP and E\u221e = Ebad(\u2206)\nfor \u2206 > \u2206AMP. De\ufb01nition 1.4 is equivalent to \u2206AMP = sup{\u2206|E\u221e = Egood(\u2206)}. Moreover we\nwill also use that iRS(Egood; \u2206) is analytic on ]0, \u2206(cid:48)[, iRS(Ebad; \u2206) is analytic on ]\u2206AMP,\u221e[, and\nthe only non-analyticity point of minE\u2208[0,v] iRS(E; \u2206) is at \u2206RS.\nRelation to other works: Explicit single-letter characterization of the MI in the rank-one problem\nhas attracted a lot of attention recently. Particular cases of theorem 1.1 have been shown rigorously\nin a number of situations. A special case when si =\u00b11\u223c Ber(1/2) already appeared in [13] where\nan equivalent spin glass model is analysed. Very recently, [9] has generalized the results of [13]\nand, notably, obtained a generic matching upper bound. The same formula has been also rigorously\ncomputed following the study of AMP in [3] for spiked models (provided, however, that the signal\nwas not too sparse) and in [4] for strictly symmetric community detection.\n\n1We take E[S] (cid:54)= 0. Once theorem 1.1 is proven for this case a limiting argument allows to extend it to\n\nE[S] = 0.\n\n3\n\n\fFigure 1: The replica symmetric potential iRS(E) for four values of \u2206 in the Wigner spiked model. The MI\nis min iRS(E) (the black dot, while the black cross corresponds to the local minimum) and the asymptotic\nmatrix-MMSE is v2\u2212(v\u2212argminEiRS(E))2, where v = \u03c1 in this case with \u03c1 = 0.02 as in the inset of \ufb01gure 2.\nFrom top left to bottom right: (1) For low noise values, here \u2206 = 0.0008 < \u2206AMP, there exists a unique \u201cgood\u201d\nminimum corresponding to the MMSE and AMP is Bayes optimal. (2) As the noise increases, a second local\n\u201cbad\u201d minimum appears: this is the situation at \u2206AMP < \u2206 = 0.0012 < \u2206RS. (3) For \u2206 = 0.00125 > \u2206RS, the\n\u201cbad\u201d minimum becomes the global one and the MMSE suddenly deteriorates. (4) For larger values of \u2206, only\nthe \u201cbad\u201d minimum exists. AMP can be seen as a naive minimizer of this curve starting from E = v = 0.02. It\nreaches the global minimum in situations (1), (3) and (4), but in (2), when \u2206AMP < \u2206 < \u2206RS, it is trapped by\nthe local minimum with large MSE instead of reaching the global one corresponding to the MMSE.\n\nFor rank-one symmetric matrix estimation problems, AMP has been introduced by [6], who also\ncomputed the SE formula to analyse its performance, generalizing techniques developed by [11] and\n[12]. SE was further studied by [3] and [4]. In [7, 10], the generalization to larger rank was also\nconsidered. The general formula proposed by [10] for the conditional entropy and the MMSE on the\nbasis of the heuristic cavity method from statistical physics was not demonstrated in full generality.\nWorst, all existing proofs could not reach the more interesting regime where a gap between the\nalgorithmic and information theoretic perfomances appears, leaving a gap with the statistical physics\nconjectured formula (and rigorous upper bound from [9]). Our result closes this conjecture and has\ninteresting non-trivial implications on the computational complexity of these tasks.\nOur proof technique combines recent rigorous results in coding theory along the study of capacity-\nachieving spatially coupled codes [14, 15, 16, 17] with other progress, coming from developments\nin mathematical physics putting on a rigorous basis predictions of spin glass theory [18]. From\nthis point of view, the theorem proved in this paper is relevant in a broader context going beyond\nlow-rank matrix estimation. Hundreds of papers have been published in statistics, machine learning\nor information theory using the non-rigorous statistical physics approach. We believe that our result\nhelps setting a rigorous foundation of a broad line of work. While we focus on rank-one symmetric\nmatrix estimation, our proof technique is readily extendable to more generic low-rank symmetric\nmatrix or low-rank symmetric tensor estimation. We also believe that it can be extended to other\nproblems of interest in machine learning and signal processing, such as generalized linear regression,\nfeatures/dictionary learning, compressed sensing or multi-layer neural networks.\n\n2 Two examples: Wigner spiked model and community detection\n\nIn order to illustrate the consequences of our results we shall present two examples.\nWigner spiked model: In this model, the vector s is a Bernoulli random vector, Si \u223c Ber(\u03c1). For\nlarge enough densities (i.e. \u03c1 > 0.041(1)), [3] computed the matrix-MMSE and proved that AMP is a\ncomputationally ef\ufb01cient algorithm that asymptotically achieves the matrix-MMSE for any value of\nthe noise \u2206. Our results allow to close the gap left open by [3]: on one hand we now obtain rigorously\nthe MMSE for \u03c1\u2264 0.041(1), and on the other one we observe that for such values of \u03c1, and as \u2206\ndecreases, there is a small region where two local minima coexist in iRS(E; \u2206). In particular for\n\u2206AMP < \u2206 < \u2206Opt = \u2206RS the global minimum corresponding to the MMSE differs from the local\none that traps AMP, and a computational gap appears (see \ufb01gure 1). While the region where AMP\nis Bayes optimal is quite large, the region where is it not, however, is perhaps the most interesting\none. While this is by no means evident, statistical physics analogies with physical phase transitions\nin nature suggest that this region should be hard for a very broad class of algorithms. For small \u03c1 our\n\n4\n\n 0.095 0.1 0.105 0.11 0.115 0.12 0.125 0 0.005 0.01 0.015 0.02iRS(E) \u2206=0.0008 0.082 0.083 0.084 0.085 0.086 0 0.005 0.01 0.015 0.02 \u2206=0.0012 0.08 0.082 0.084 0 0.005 0.01 0.015 0.02iRS(E)E\u2206=0.00125 0.066 0.068 0.07 0.072 0.074 0.076 0.078 0.08 0 0.005 0.01 0.015 0.02 E\u2206=0.0015\fFigure 2: Phase diagram in the noise variance \u2206 versus density \u03c1 plane for the rank-one spiked Wigner model\n(left) and the asymmetric community detection (right). Left: [3] proved that AMP achieves the matrix-MMSE\nfor all \u2206 as long as \u03c1 > 0.041(1). Here we show that AMP is actually achieving the optimal reconstruction in\nthe whole phase diagram except in the small region between the blue and red lines. Notice the large gap with\nspectral methods (dashed black line). Inset: matrix-MMSE (blue) at \u03c1 = 0.02 as a function of \u2206. AMP (dashed\nred) provably achieves the matrix-MMSE except in the region \u2206AMP < \u2206 < \u2206Opt = \u2206RS. We conjecture\nthat no polynomial-time algorithm will do better than AMP in this region. Right: Asymmetric community\n\ndetection problem with two communities. For \u03c1 > 1/2\u2212(cid:112)1/12 (black point) and when \u2206 > 1, it is information\ncan \ufb01nd a non-trivial overlap with the truth as well, starting from \u2206 < 1. For \u03c1 < 1/2\u2212(cid:112)1/12, however, it is\n\ntheoretically impossible to \ufb01nd any overlap with the true communities and the matrix-MMSE is 1, while it\nbecomes possible for \u2206 < 1. In this region, AMP is always achieving the matrix-MMSE and spectral methods\n\ninformation theoretically possible to \ufb01nd an overlap with the hidden communities for \u2206 > 1 (below the blue line)\nbut both AMP and spectral methods miss this information. Inset: matrix-MMSE (blue) at \u03c1 = 0.05 as a function\nof \u2206. AMP (dashed red) again provably achieves the matrix-MMSE except in the region \u2206AMP < \u2206 < \u2206Opt.\n\n\u221a\n\n\u221a\n\nresults are consistent with the known optimal and algorithmic thresholds predicted in sparse PCA\n[19, 20], that treats the case of sub-extensive \u03c1 =O(1) values. Another interesting line of work for\nsuch probabilistic models appeared in the context of random matrix theory (see [8] and references\ntherein) and predicts that a sharp phase transition occurs at a critical value of the noise \u2206spectral = \u03c12\nbelow which an outlier eigenvalue (and its principal eigenvector) has a positive correlation with the\nhidden signal. For larger noise values the spectral distribution of the observation is indistinguishable\nfrom that of the pure random noise.\nAsymmetric balanced community detection: We now consider the problem of detecting two com-\nmunities (groups) with different sizes \u03c1n and (1\u2212 \u03c1)n, that generalizes the one considered in\n[4]. One is given a graph where the probability to have a link between nodes in the \ufb01rst group\nis p + \u00b5(1\u2212 \u03c1)/(\u03c1\nn(1\u2212 \u03c1)), while inter-\nconnections appear with probability p\u2212 \u00b5/\nn. With this peculiar \u201cbalanced\u201d setting, the nodes\nin each group have the same degree distribution with mean pn, making them harder to distin-\nguish. According to the universality property described in the \ufb01rst section, this is equivalent to a\nmodel with AWGN of variance \u2206 = p(1\u2212 p)/\u00b52 where each variable si is chosen according to\n\nP0(s) = \u03c1\u03b4(s\u2212(cid:112)(1\u2212\u03c1)/\u03c1)+(1\u2212\u03c1)\u03b4(s+(cid:112)\u03c1/(1\u2212\u03c1)). Our results for this problem2 are summarized\non the right hand side of \ufb01gure 2. For \u03c1 > \u03c1c = 1/2\u2212(cid:112)1/12 (black point), it is asymptotically\n\n\u221a\nn), between those in the second group is p + \u00b5\u03c1/(\n\ninformation theoretically possible to get an estimation better than chance if and only if \u2206 < 1. When\n\u03c1 < \u03c1c, however, it becomes possible for much larger values of the noise. Interestingly, AMP and\nspectral methods have the same transition and can \ufb01nd a positive correlation with the hidden commu-\nnities for \u2206 < 1, regardless of the value of \u03c1. Again, a region [\u2206AMP, \u2206Opt = \u2206RS] exists where a\ncomputational gap appears when \u03c1 < \u03c1c. One can investigate the very low \u03c1 regime where we \ufb01nd\nthat the information theoretic transition goes as \u2206Opt(\u03c1\u2192 0) = 1/(4\u03c1| log \u03c1|). Now if we assume\nthat this result stays true even for \u03c1 = O(1) (which is a speculation at this point), we can choose\n\u00b5\u2192 (1\u2212p)\u03c1\nn such that the small group is a clique. Then the problem corresponds to a \u201cbalanced\u201d\nversion of the famous planted clique problem [21]. We \ufb01nd that the AMP/spectral approach \ufb01nds the\n\n\u221a\n\n2Note that here since E = v = 1 is an extremum of iRS(E; \u2206), one must introduce a small bias in P0 and let\n\nit then tend to zero at the end of the proofs.\n\n5\n\n 0 0.001 0.002 0.003 0.004 0.005 0 0.01 0.02 0.03 0.04 0.05\u2206\u03c1Wigner Spike model\u2206AMP\u2206Opt\u2206spectral 0 0.001 0.002 0 0.0001 0.0002 0.0003 0.0004 matrix-MSE(\u2206) at \u03c1=0.02\u2206opt\u2206AMPMMSEAMP 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 0 0.1 0.2 0.3 0.4 0.5\u2206\u03c1Asymmetric Community Detection\u2206AMP\u2206Opt\u2206spectral 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 matrix-MSE(\u2206) at \u03c1=0.05\u2206opt\u2206AMPMMSEAMP\fhidden clique when it is larger than(cid:112)np/(1\u2212p), while the information theoretic transition translates\nclique problem at p = 1/2 with its gap between log(n) (information theoretic),(cid:112)n/e (AMP [22])\n\ninto size of the clique 4p log(n)/(1\u2212p). This is indeed reminiscent of the more classical planted\n\nand\nn (spectral [21]). Since in our balanced case the spectral and AMP limits match, this suggests\nthat the small gain of AMP in the standard clique problem is simply due to the information provided\nby the distribution of local degrees in the two groups (which is absent in our balanced case). We\nbelieve this correspondence strengthens the claim that the AMP gap is actually a fundamental one.\n\n\u221a\n\n3 Proofs\n\nThe crux of our proof rests on an auxiliary \u201cspatially coupled system\u201d. The hallmark of spatially\ncoupled models is that one can tune them so that the gap between the algorithmic and information\ntheoretic limits is eliminated, while at the same time the MI is maintained unchanged for the coupled\nand original models. Roughly speaking, this means that it is possible to algorithmically compute\nthe information theoretic limit of the original model because a suitable algorithm is optimal on\nthe coupled system. The present spatially coupled construction is similar to the one used for the\ncoupled Curie-Weiss model [14]. Consider a ring of length L+1 (L even) with blocks positioned at\n\u00b5\u2208{0, . . . , L} and coupled to neighboring blocks {\u00b5\u2212w, . . . , \u00b5+w}. Positions \u00b5 are taken modulo\nL+1 and the integer w\u2208{0, . . . , L/2} equals the size of the coupling window. The coupled model is\n\n(cid:114)\n\n\u039b\u00b5\u03bd\nn\n\n\u221a\n\n\u2206,\n\n+ zi\u00b5j\u03bd\n\nwi\u00b5j\u03bd = si\u00b5 sj\u03bd\n\n(3)\nwhere the index i\u00b5 \u2208{1, . . . , n} (resp. j\u03bd) belongs to the block \u00b5 (resp. \u03bd) along the ring, \u039b is an\n(L+1)\u00d7(L+1) matrix which describes the strength of the coupling between blocks, and Zi\u00b5j\u03bd \u223cN (0, 1)\nare i.i.d. For the proof to work, the matrix elements have to be chosen appropriately. We assume\nthat: i) \u039b is a doubly stochastic matrix; ii) \u039b\u00b5\u03bd depends on |\u00b5\u2212\u03bd|; iii) \u039b\u00b5\u03bd is not vanishing for\n|\u00b5\u2212\u03bd| \u2264 w and vanishes for |\u00b5\u2212\u03bd| > w; iv) \u039b is smooth in the sense |\u039b\u00b5\u03bd\u2212\u039b\u00b5+1\u03bd| =O(w\u22122); v)\n\u039b has a non-negative Fourier transform. All these conditions can easily be met, the simplest example\nbeing a triangle of base 2w +1 and height 1/(w +1). The construction of the coupled system is\ncompleted by introducing a seed in the ring: we assume perfect knowledge of the signal components\n{si\u00b5} for \u00b5\u2208B :={\u2212w\u22121, . . . , w\u22121} mod L+1. This seed is what allows to close the gap between\nthe algorithmic and information theoretic limits and therefore plays a crucial role. Note it can also be\nviewed as an \u201copening\u201d of the chain with \ufb01xed boundary conditions. Our \ufb01rst crucial result states\nthat the MI Iw,L(S; W) of the coupled and original systems are the same in a suitable limit.\nLemma 3.1 (Equality of mutual informations) For any \ufb01xed w the following limits exist and are\nequal: limL\u2192\u221e limn\u2192\u221e Iw,L(S; W)/(n(L+1)) = limn\u2192\u221e I(S; W)/n.\n\nIn particular, de\ufb01ning \u2206Opt,coup\n\nAn immediate corollary is that non-analyticity points (w.r.t \u2206) of the MIs are the same\n:= sup{\u2206 |\nin the coupled and original models.\nlimL\u2192\u221e limn\u2192\u221e Iw,L(S; W)/(n(L+1)) is analytic in ]0, \u2206[}, we have \u2206Opt,coup = \u2206Opt.\nThe second crucial result states that the AMP threshold of the spatially coupled system is at least\nas good as \u2206RS. The analysis of AMP applies to the coupled system as well [11, 12] and it can be\nshown that the performance of AMP is assessed by SE. Let Et\n2]/n be\nthe asymptotic average vector-MSE of the AMP estimate \u02c6st\n\u00b5 at time t for the \u00b5-th \u201cblock\u201d of S. We\nassociate to each position \u00b5 \u2208 {0, . . . , L} an independent scalar system with AWGN of the form\n\u03bd=0 \u039b\u00b5\u03bdE\u03bd) and S \u223c P0, Z \u223cN (0, 1). Taking\ninto account knowledge of the signal components in B, SE reads:\n\nY = S +\u03a3\u00b5(E; \u2206)Z, with \u03a3\u00b5(E; \u2206)2 := \u2206/(v\u2212(cid:80)L\n\n\u00b5 := limn\u2192\u221e ES,Z[(cid:107)S\u00b5\u2212\u02c6st\n\n\u00b5(cid:107)2\n\n\u00b5 = v for \u00b5 \u2208 {0, . . . , L} \\ B, Et\n\n\u00b5 = mmse(\u03a3\u00b5(Et; \u2206)\u22122), E0\nEt+1\n\n\u00b5 \u2264 Et\n\n(4)\nwhere the mmse function is de\ufb01ned as in section 1. From the monotonicity of the mmse function we\n\u00b5 for all \u00b5\u2208{0, . . . , L}, a partial order which implies that limt\u2192\u221e Et = E\u221e exists.\nhave Et+1\n\u00b5 \u2264\nThis allows to de\ufb01ne an algorithmic threshold for the coupled system: \u2206AMP,w,L := sup{\u2206|E\u221e\nEgood(\u2206) \u2200 \u00b5}. We show (equality holds but is not directly needed):\nLemma 3.2 (Threshold saturation) Let \u2206AMP,coup := lim inf w\u2192\u221e lim inf L\u2192\u221e \u2206AMP,w,L. We\nhave \u2206AMP,coup\u2265 \u2206RS.\n\n\u00b5 = 0 for \u00b5 \u2208 B, t \u2265 0,\n\n6\n\n\f.\n\nd\n\nd\u2206\u22121 min\nE\u2208[0,v]\n\niRS(E; \u2206) \u2265 lim sup\nn\u2192\u221e\n\n1\nn\n\ndI(S; W)\nd\u2206\u22121\n\nProof sketch of theorem 1.1: First we prove the RS formula for \u2206 \u2264 \u2206Opt. It is known [3] that the\nmatrix-MSE of AMP when n\u2192\u221e is equal to v2\u2212(v\u2212Et)2. This cannot improve the matrix-MMSE,\nhence (v2\u2212(v\u2212E\u221e)2)/4\u2265 lim supn\u2192\u221e Mmmsen/4. For \u2206\u2264 \u2206AMP we have E\u221e = Egood(\u2206)\nwhich is the global minimum of (1) so the left hand side of the last inequality equals the derivative of\nminE\u2208[0,v] iRS(E; \u2206) w.r.t \u2206\u22121. Thus using the matrix version of the I-MMSE relation [23] we get\n(5)\nIntegrating this relation on [0, \u2206] \u2282 [0, \u2206AMP] and checking that minE\u2208[0,v] iRS(E; 0) = H(S)\n(the Shannon entropy of P0) we obtain minE\u2208[0,v] iRS(E; \u2206) \u2264 lim inf n\u2192\u221e I(S; W)/n. But we\nknow I(S; W)/n\u2264 minE\u2208[0,v] iRS(E; \u2206) [9], thus we already get theorem 1.1 for \u2206\u2264 \u2206AMP. We\nnotice that \u2206AMP \u2264 \u2206Opt. While this might seem intuitively clear, it follows from \u2206RS \u2265 \u2206AMP\n(by their de\ufb01nitions) which together with \u2206AMP > \u2206Opt would imply from theorem 1.1 that\nlimn\u2192\u221e I(S; W)/n is analytic at \u2206Opt, a contradiction. The next step is to extend theorem 1.1 to\nthe range [\u2206AMP, \u2206Opt]. Suppose for a moment \u2206RS\u2265 \u2206Opt. Then both functions on each side of\nthe RS formula are analytic on the whole range ]0, \u2206Opt[ and since they are equal for \u2206\u2264 \u2206AMP,\nthey must be equal on their whole analyticity range and by continuity, they must also be equal at\n\u2206Opt (that the functions are continuous follows from independent arguments on the existence of\nthe n\u2192\u221e limit of concave functions). It remains to show that \u2206RS\u2208 ]\u2206AMP, \u2206Opt[ is impossible.\nWe proceed by contradiction, so suppose this is true. Then both functions on each side of the RS\nformula are analytic on ]0, \u2206RS[ and since they are equal for ]0, \u2206AMP[\u2282]0, \u2206RS[ they must be equal\non the whole range ]0, \u2206RS[ and also at \u2206RS by continuity. For \u2206 > \u2206RS the \ufb01xed point of SE is\nE\u221e = Ebad(\u2206) which is also the global minimum of iRS(E; \u2206), hence (5) is veri\ufb01ed. Integrating\nthis inequality on ]\u2206RS, \u2206[\u2282]\u2206RS, \u2206Opt[ and using I(S; W)/n\u2264 minE\u2208[0,v] iRS(E; \u2206) again, we\n\ufb01nd that the RS formula holds for all \u2206\u2208 [0, \u2206Opt]. But this implies that minE\u2208[0,v] iRS(E; \u2206) is\nanalytic at \u2206RS, a contradiction.\nWe now prove the RS formula for \u2206\u2265 \u2206Opt. Note that the previous arguments showed that necessarily\n\u2206Opt\u2264 \u2206RS. Thus by lemmas 3.1 and 3.2 (and the sub-optimality of AMP as shown as before) we\nobtain \u2206RS \u2264 \u2206AMP,coup \u2264 \u2206Opt,coup = \u2206Opt \u2264 \u2206RS. This shows that \u2206Opt = \u2206RS (this is the\npoint where spatial coupling came in the game and we do not know of other means to prove such\nan equality). For \u2206 > \u2206RS we have E\u221e = Ebad(\u2206) which is the global minimum of iRS(E; \u2206).\nTherefore we again have (5) in this range and the proof can be completed by using once more the\nintegration argument, this time over the range [\u2206RS, \u2206] = [\u2206Opt, \u2206].\nProof sketch of corollaries 1.3 and 1.5: Let E\u2217(\u2206) =argminEiRS(E; \u2206) for \u2206(cid:54)= \u2206RS. By explicit\ncalculation one checks that diRS(E\u2217, \u2206)/d\u2206\u22121 = (v2\u2212(v\u2212E\u2217(\u2206))2)/4, so from theorem 1.1 and\nthe matrix form of the I-MMSE relation we \ufb01nd Mmmsen\u2192 v2\u2212(v\u2212E\u2217(\u2206))2 as n\u2192\u221e which is\nthe \ufb01rst part of the statement of corollary 1.3. Let us now turn to corollary 1.5. For n\u2192\u221e the vector-\nMSE of the AMP estimator at time t equals Et, and since the \ufb01xed point equation corresponding to\nSE is precisely the stationarity equation for iRS(E; \u2206), we conclude that for \u2206 /\u2208 [\u2206AMP, \u2206RS] we\nmust have E\u221e = E\u2217(\u2206). It remains to prove that E\u2217(\u2206) = limn\u2192\u221e Vmmsen(\u2206) at least for \u2206 /\u2208\n[\u2206AMP, \u2206RS] (we believe this is in fact true for all \u2206). This will settle the second part of corollary 1.3\nas well as 1.5. Using (Nishimori) identities ES,W[SiSjE[XiXj|W]] =ES,W[E[XiXj|W]2] (see e.g.\n[9]) and using the law of large numbers we can show limn\u2192\u221e Mmmsen \u2264 limn\u2192\u221e(v2 \u2212 (v\u2212\nVmmsen(\u2206))2). Concentration techniques similar to [13] suggest that the equality in fact holds\n(for \u2206 (cid:54)= \u2206RS) but there are technicalities that prevent us from completing the proof of equality.\nHowever it is interesting to note that this equality would imply E\u2217(\u2206) = limn\u2192\u221e Vmmsen(\u2206) for\nall \u2206(cid:54)= \u2206RS. Nevertheless, another argument can be used when AMP is optimal. On one hand the\nright hand side of the inequality is necessarily smaller than v2\u2212(v\u2212E\u221e)2. On the other hand the left\nhand side of the inequality is equal to v2\u2212(v\u2212E\u2217(\u2206))2. Since E\u2217(\u2206) = E\u221e when \u2206 /\u2208 [\u2206AMP, \u2206RS],\nwe can conclude limn\u2192\u221e Vmmsen(\u2206) =argminEiRS(E; \u2206) for this range of \u2206.\nProof sketch of lemma 3.1: Here we prove the lemma for a ring that is not seeded. An easy argument\nshows that a seed of size w does not change the MI per variable when L\u2192\u221e. The statistical physics\nformulation is convenient: up to the trivial additive term n(L+1)v2/4, the MI Iw,L(S; W) equals the\n\nfree energy \u2212ES,Z[lnZw,L], where Zw,L :=(cid:82) dxP0(x) exp(\u2212H(x, z, \u039b)) and\n(cid:88)\n\n\u00b5+w(cid:88)\n\n(cid:17)\n\nAi\u00b5j\u00b5(x, z, \u039b) +\n\n\u039b\u00b5\u03bd\n\nAi\u00b5j\u03bd (x, z, \u039b)\n\n,\n\n(6)\n\nH(x, z, \u039b) =\n\nL(cid:88)\n\n(cid:16)\n\n\u00b5=0\n\n1\n\u2206\n\n\u039b\u00b5\u00b5\n\n(cid:88)\n\ni\u00b5\u2264j\u00b5\n\n\u03bd=\u00b5+1\n\ni\u00b5,j\u03bd\n\n7\n\n\f\u2206)/(cid:112)n\u039b\u00b5\u03bd. Consider a\n\n\u221a\n\ndt\n\ni\u00b5j\u03bd\n\n1\n\nx2\nj\u03bd\n\n\u2212\n\n1\n\nn(L + 1)\n\nd\ndt\n\n4\u2206(L + 1)\n\nES,Z,Z(cid:48)[lnZt] =\n\nES,Z,Z(cid:48)[(cid:104)q(cid:124)\n\n\u039bq \u2212 q(cid:124)\n\n)/(2n)\u2212(si\u00b5 sj\u03bd xi\u00b5xj\u03bd )/n\u2212(xi\u00b5xj\u03bd zi\u00b5j\u03bd\n\nq\u00b5 :=(cid:80)n\n\nwith Ai\u00b5j\u03bd (x, z, \u039b) := (x2\ni\u00b5\npair of systems with coupling matrices \u039b and \u039b(cid:48) and i.i.d noize realizations z, z(cid:48), an interpolated\nHamiltonian H(x, z, t\u039b)+H(x, z(cid:48), (1\u2212t)\u039b(cid:48)), t \u2208 [0, 1], and the corresponding partition function Zt.\nThe main idea of the proof is to show that for suitable choices of matrices, \u2212 d\nES,Z,Z(cid:48)[lnZt]\u2264 0 for all\nt\u2208 [0, 1] (up to negligible terms), so that by the fundamental theorem of calculus, we get a comparison\nbetween the free energies of H(x, z, \u039b) and H(x, z(cid:48), \u039b(cid:48)). Performing the t-derivative brings down a\nGibbs average of a polynomial in all variables si\u00b5, xi\u00b5, zi\u00b5j\u03bd and z(cid:48)\ni\u00b5j\u03bd . This expectation over S, Z,\nZ(cid:48) of this Gibbs average is simpli\ufb01ed using integration by parts over the Gaussian noise zi\u00b5j\u03bd , z(cid:48)\nand Nishimori identities (see e.g. proof of corollary 1.3 for one of them). This algebra leads to\n\u039b(cid:48)q(cid:105)t] + O(1/(nL)),\n\n(7)\nwhere (cid:104)\u2212(cid:105)t is the Gibbs average w.r.t the interpolated Hamiltonian, q is the vector of overlaps\ni\u00b5=1 si\u00b5xi\u00b5/n. If we can choose matrices s.t \u039b(cid:48) > \u039b, the difference of quadratic forms\nin the Gibbs bracket is negative and we obtain an inequality in the large size limit. We use this\nscheme to interpolate between the fully decoupled system w = 0 and the coupled one 1\u2264 w < L/2\nand then between 1 \u2264 w < L/2 and the fully connected system w = L/2. The w = 0 system has\n\u039b\u00b5\u03bd = \u03b4\u00b5\u03bd with eigenvalues (1, 1, . . . , 1). For the 1 \u2264 w < L/2 system, we take any stochastic\ntranslation invariant matrix with non-negative discrete Fourier transform (of its rows): such matrices\nhave an eigenvalue equal to 1 and all others in [0, 1[ (the eigenvalues are precisely equal to the\ndiscrete Fourier transform). For w = L/2 we choose \u039b\u00b5\u03bd = 1/(L + 1) which is a projector with\neigenvalues (0, 0, . . . , 1). With these choices we deduce that the free energies and MIs are ordered as\nIw=0,L +O(1)\u2264 Iw,L +O(1)\u2264 Iw=L/2,L +O(1). To conclude the proof we divide by n(L+1) and\nnote that the limits of the leftmost and rightmost MIs are equal, provided the limit exists. Indeed the\nleftmost term equals L times I(S; W) and the rightmost term is the same MI for a system of n(L+1)\nvariables. Existence of the limit follows by subadditivity, proven by a similar interpolation [18].\nProof sketch of lemma 3.2: Fix \u2206 < \u2206RS. We show that, for w large enough, the coupled SE\n\u00b5 \u2264 Egood(\u2206) for all \u00b5. The main intuition behind\nrecursion (4) must converge to a \ufb01xed point E\u221e\nthe proof is to use a \u201cpotential function\u201d whose \u201cenergy\u201d can be lowered by small perturbation of\na \ufb01xed point that would go above Egood(\u2206) [16, 17]. The relevant potential function iw,L(E, \u2206)\nis in fact the replica potential of the coupled system (a generalization of (1)). The stationarity\ncondition for this potential is precisely (4) (without the seeding condition). Monotonicity properties\nof SE ensure that any \ufb01xed point has a \u201cunimodal\u201d shape (and recall that it vanishes for \u00b5 \u2208 B =\n{0, . . . , w\u22121}\u222a{L\u2212w, . . . , L}). Consider a position \u00b5max\u2208{w, . . . , L\u2212w\u22121} where it is maximal\n> Egood(\u2206). We associate to the \ufb01xed point E\u221e a so-called saturated\nand suppose that E\u221e\n\u00b5 = Egood(\u2206) for all \u00b5\u2264 \u00b5\u221e where \u00b5\u221e +1 is the\npro\ufb01le Es de\ufb01ned on the whole of Z as follows: Es\nsmallest position s.t E\u221e\n\u00b5max for all\n\u00b5\u2265 \u00b5max. We show that Es cannot exist for w large enough. To this end de\ufb01ne a shift operator by\n[S(Es)]\u00b5 := Es\n\u00b5\u22121. On one hand the shifted pro\ufb01le is a small perturbation of Es which matches a\n\ufb01xed point, except where it is constant, so if we Taylor expand, the \ufb01rst order vanishes and the second\norder and higher orders can be estimated as |iw,L(S(Es); \u2206)\u2212iw,L(Es; \u2206)| =O(1/w) uniformly in\nL. On the other hand, by explicit cancellation of telescopic sums iw,L(S(Es); \u2206)\u2212iw,L(Es; \u2206) =\niRS(Egood; \u2206)\u2212iRS(E\u221e\n; \u2206). Now one can show from monotonicity properties of SE that if E\u221e\nis a non trivial \ufb01xed point of the coupled SE then E\u221e\ncannot be in the basin of attraction of\nEgood(\u2206) for the uncoupled SE recursion. Consequently as can be seen on the plot of iRS(E; \u2206) (e.g.\n; \u2206)\u2265 iRS(Ebad; \u2206). Therefore iw,L(S(Es); \u2206)\u2212iw,L(Es; \u2206)\u2264\n\ufb01gure 1) we must have iRS(E\u221e\n\u2212|iRS(Ebad; \u2206)\u2212iRS(Egood; \u2206)| which is an energy gain independent of w, and for large enough\nw we get a contradiction with the previous estimate coming from the Taylor expansion.\n\n\u00b5 = E\u221e\n\n\u00b5 for \u00b5\u2208{\u00b5\u221e+1, . . . , \u00b5max\u22121}; Es\n\n\u00b5 = E\u221e\n\n\u00b5 > Egood(\u2206); Es\n\n\u00b5max\n\n\u00b5max\n\n\u00b5max\n\n\u00b5max\n\nAcknowledgments\n\nJ.B and M.D acknowledge funding from the SNSF (grant 200021-156672). Part of this research\nreceived funding from the ERC under the EU\u2019s 7th Framework Programme (FP/2007-2013/ERC\nGrant Agreement 307087-SPARCS). F.K and L.Z thank the Simons Institute for its hospitality.\n\n8\n\n\fReferences\n[1] H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis. Journal of computa-\n\ntional and graphical statistics, 15(2):265\u2013286, 2006.\n\n[2] I.M. Johnstone and A.Y. Lu. On consistency and sparsity for principal components analysis in\n\nhigh dimensions. Journal of the American Statistical Association, 2012.\n\n[3] Y. Deshpande and A. Montanari. Information-theoretically optimal sparse pca. In IEEE Int.\n\nSymp. on Inf. Theory, pages 2197\u20132201, 2014.\n\n[4] Y. Deshpande, E. Abbe, and A. Montanari. Asymptotic mutual information for the two-groups\n\nstochastic block model. arXiv:1507.08685, 2015.\n\n[5] E.J. Cand\u00e8s and B. Recht. Exact matrix completion via convex optimization. Foundations of\n\nComputational mathematics, 9(6):717\u2013772, 2009.\n\n[6] S. Rangan and A.K. Fletcher. Iterative estimation of constrained rank-one matrices in noise. In\n\nIEEE Int. Symp. on Inf. Theory, pages 1246\u20131250, 2012.\n\n[7] T. Lesieur, F. Krzakala, and L. Zdeborov\u00e1. Phase transitions in sparse pca. In IEEE Int. Symp.\n\non Inf. Theory, page 1635, 2015.\n\n[8] J. Baik, G. Ben Arous, and S. P\u00e9ch\u00e9. Phase transition of the largest eigenvalue for nonnull\n\ncomplex sample covariance matrices. Annals of Probability, page 1643, 2005.\n\n[9] F. Krzakala, J. Xu, and L. Zdeborov\u00e1. Mutual information in rank-one matrix estimation.\n\narXiv:1603.08447, 2016.\n\n[10] T. Lesieur, F. Krzakala, and L. Zdeborov\u00e1. Mmse of probabilistic low-rank matrix estimation:\n\nUniversality with respect to the output channel. In Annual Allerton Conference, 2015.\n\n[11] M. Bayati and A. Montanari. The dynamics of message passing on dense graphs, with applica-\n\ntions to compressed sensing. IEEE Trans. on Inf. Theory, 57(2):764 \u2013785, 2011.\n\n[12] A. Javanmard and A. Montanari. State evolution for general approximate message passing\n\nalgorithms, with applications to spatial coupling. J. Infor. & Inference, 2:115, 2013.\n\n[13] S.B. Korada and N. Macris. Exact solution of the gauge symmetric p-spin glass model on a\n\ncomplete graph. Journal of Statistical Physics, 136(2):205\u2013230, 2009.\n\n[14] S.H. Hassani, N. Macris, and R. Urbanke. Coupled graphical models and their thresholds. In\n\nIEEE Information Theory Workshop (ITW), 2010.\n\n[15] S. Kudekar, T.J. Richardson, and R. Urbanke. Threshold saturation via spatial coupling: Why\nconvolutional ldpc ensembles perform so well over the bec. IEEE Trans. on Inf. Th., 57, 2011.\n[16] A. Yedla, Y.Y. Jian, P.S. Nguyen, and H.D. P\ufb01ster. A simple proof of maxwell saturation for\n\ncoupled scalar recursions. IEEE Trans. on Inf. Theory, 60(11):6943\u20136965, 2014.\n\n[17] J. Barbier, M. Dia, and N. Macris. Threshold saturation of spatially coupled sparse superposition\n\ncodes for all memoryless channels. CoRR, abs/1603.04591, 2016.\n\n[18] F. Guerra. An introduction to mean \ufb01eld spin glass theory: methods and results. Mathematical\n\nStatistical Physics, pages 243\u2013271, 2005.\n\n[19] A.A. Amini and M.J. Wainwright. High-dimensional analysis of semide\ufb01nite relaxations for\n\nsparse principal components. In IEEE Int. Symp. on Inf. Theory, page 2454, 2008.\n\n[20] Q. Berthet and P. Rigollet. Computational lower bounds for sparse pca. arXiv:1304.0828, 2013.\n[21] A. d\u2019Aspremont, L. El Ghaoui, M.I. Jordan, and G.RG. Lanckriet. A direct formulation for\n\n[22] Y. Deshpande and A. Montanari. Finding hidden cliques of size(cid:112)N/e in nearly linear time.\n\nsparse pca using semide\ufb01nite programming. SIAM review, 49(3):434, 2007.\n\nFoundations of Computational Mathematics, 15(4):1069\u20131128, 2015.\n\n[23] D. Guo, S. Shamai, and S. Verd\u00fa. Mutual information and minimum mean-square error in\n\ngaussian channels. IEEE Trans. on Inf. Theory, 51, 2005.\n\n9\n\n\f", "award": [], "sourceid": 248, "authors": [{"given_name": "jean", "family_name": "barbier", "institution": "EPFL"}, {"given_name": "Mohamad", "family_name": "Dia", "institution": "EPFL"}, {"given_name": "Nicolas", "family_name": "Macris", "institution": "EPFL"}, {"given_name": "Florent", "family_name": "Krzakala", "institution": "Ecole Normale Superieure Paris"}, {"given_name": "Thibault", "family_name": "Lesieur", "institution": "IPHT Saclay"}, {"given_name": "Lenka", "family_name": "Zdeborov\u00e1", "institution": "CNRS and CEA Saclay"}]}