{"title": "Sparse Recovery with Brownian Sensing", "book": "Advances in Neural Information Processing Systems", "page_first": 1782, "page_last": 1790, "abstract": "We consider the problem of recovering the parameter alpha in R^K of a sparse function f, i.e. the number of non-zero entries of alpha is small compared to the number K of features, given noisy evaluations of f at a set of well-chosen sampling points. We introduce an additional randomisation process, called Brownian sensing, based on the computation of stochastic integrals, which produces a Gaussian sensing matrix, for which good recovery properties are proven independently on the number of sampling points N, even when the features are arbitrarily non-orthogonal. Under the assumption that f is H\u00f6lder continuous with exponent at least 1/2, we provide an estimate a of the parameter such that ||\\alpha - a||_2 = O(||eta||_2\\sqrt{N}), where eta is the observation noise. The method uses a set of sampling points uniformly distributed along a one-dimensional curve selected according to the features. We report numerical experiments illustrating our method.", "full_text": "Sparse Recovery with Brownian Sensing\n\nAlexandra Carpentier\n\nINRIA Lille\n\nOdalric-Ambrym Maillard\n\nINRIA Lille\n\nAlexandra.carpentier@inria.fr\n\nodalricambrym.maillard@gmail.com\n\nR\u00b4emi Munos\nINRIA Lille\n\nremi.munos@inria.fr\n\nAbstract\n\nWe consider the problem of recovering the parameter \u03b1 \u2208 RK of a sparse function\nf (i.e. the number of non-zero entries of \u03b1 is small compared to the number K of\nfeatures) given noisy evaluations of f at a set of well-chosen sampling points. We\nintroduce an additional randomization process, called Brownian sensing, based on\nthe computation of stochastic integrals, which produces a Gaussian sensing ma-\ntrix, for which good recovery properties are proven, independently on the number\nof sampling points N , even when the features are arbitrarily non-orthogonal. Un-\nder the assumption that f is H\u00a8older continuous with exponent at least 1/2, we pro-\n\nvide an estimate \ufffd\u03b1 of the parameter such that \ufffd\u03b1 \u2212\ufffd\u03b1\ufffd2 = O(\ufffd\u03b7\ufffd2/\u221aN ), where\n\n\u03b7 is the observation noise. The method uses a set of sampling points uniformly\ndistributed along a one-dimensional curve selected according to the features. We\nreport numerical experiments illustrating our method.\n\n1 Introduction\n\nWe consider the problem of sensing an unknown function f : X \u2192 R (where X \u2282 Rd), where f\nbelongs to span of a large set of (known) features {\u03d5k}1\u2264k\u2264K of L2(X ):\n\nf (x) =\n\nK\ufffdk=1\n\n\u03b1k\u03d5k(x),\n\nwhere \u03b1 \u2208 RK is the unknown parameter, and is assumed to be S-sparse, i.e. \ufffd\u03b1\ufffd0\n0}| \u2264 S. Our goal is to recover \u03b1 as accurately as possible.\nIn the setting considered here we are allowed to select the points {xn}1\u2264n\u2264N \u2208 X where the\nfunction f is evaluated, which results in the noisy observations\n\ndef= |{i : \u03b1k \ufffd=\n\nyn = f (xn) + \u03b7n,\n\n(1)\n\ni.e.,\n\ndef=\n\n2\n\nwhere \u03b7n is an observation noise term. We assume that\n\ufffd\u03b7\ufffd2\n\u03b72\nn \u2264 \u03c32. We write DN = ({xn, yn}1\u2264n\u2264N ) the set of observations and we are in-\nterested in situations where N \ufffd K, i.e., the number of observations is much smaller than the\nnumber of features \u03d5k.\n\nthe noise is bounded,\n\nN\ufffdn=1\n\nThe question we wish to address is: how well can we recover \u03b1 based on a set of N noisy mea-\nsurements? Note that whenever the noise is non-zero, the recovery cannot be perfect, so we wish to\n\nexpress the estimation error \ufffd\u03b1 \u2212\ufffd\u03b1\ufffd2 in terms of N , where \ufffd\u03b1 is our estimate.\n\n1\n\n\fThe proposed method. We address the problem of sparse recovery by combining the two ideas:\n\n\u2022 Sparse recovery theorems (see Section 2) essentially say that in order to recover a vector\nwith a small number of measurements, one needs incoherence. The measurement basis,\ncorresponding to the pointwise evaluations f (xn), should to be incoherent with the rep-\nresentation basis, corresponding to the one on which the vector \u03b1 is sparse. Interpreting\nthese basis in terms of linear operators, pointwise evaluation of f is equivalent to mea-\nsuring f using Dirac masses \u03b4xn (f ) def= f (xn). Since in general the representation basis\n{\u03d5k}1\u2264k\u2264K is not incoherent with the measurement basis induced by Dirac operators, we\nwould like to consider another measurement basis, possibly randomized, in order that it\nbecomes incoherent with any representation basis.\n\n\u2022 Since we are interested in reconstructing \u03b1, and since we assumed that f is linear in \u03b1,\nwe can apply any set of M linear operators {Tm}1\u2264m\u2264M to f = \ufffdk \u03b1k\u03d5k, and consider\nthe problem transformed by the operators; the parameter \u03b1 is thus also the solution to the\ntransformed problem Tm(f ) =\ufffdk \u03b1kTm(\u03d5k).\n\nThus, instead of considering the N\u00d7K sensing matrix \u03a6 = (\u03b4xn (\u03d5k))k,n, we consider a new M\u00d7K\nsensing matrix A = (Tm(\u03d5k))k,m, where the operators {Tm}1\u2264m\u2264M enforce incoherence between\nbases. Provided that we can estimate Tm(f ) with the data set DN , we will be able to recover \u03b1. The\nBrownian sensing approach followed here uses stochastic integral operators {Tm}1\u2264m\u2264M , which\nmakes the measurement basis incoherent with any representation basis, and generates a sensing\nmatrix A which is Gaussian (with i.i.d. rows).\n\nThe proposed algorithm (detailed in Section 3) recovers \u03b1 by solving the system A\u03b1 \u2248 \ufffdb by l1\nminimization1, where\ufffdb \u2208 RM is an estimate, based on the noisy observations yn, of the vector\nb \u2208 RM whose components are bm = Tmf .\nContribution: Our contribution is a sparse recovery result for arbitrary non-orthonormal functional\nbasis {\u03d5k}k\u2264K of a H\u00a8older continuous function f . Theorem 4 states that our estimate \ufffd\u03b1 satis\ufb01es\n\ufffd\u03b1 \u2212\ufffd\u03b1\ufffd2 = O(\ufffd\u03b7\ufffd2/\u221aN ) with high probability whatever N , under the assumption that the noise\n\u2022 We show that when the sensing matrix A is Gaussian, i.e. when each row of the matrix is\ndrawn i.i.d. from a Gaussian distribution, orthonormality is not required for sparse recovery.\nThis result, stated in Proposition 1 (and used in Step 1 of the proof of Theorem 4), is a\nconsequence of Theorem 3.1 of [10].\n\n\u03b7 is globally bounded, such as in [3, 12]. This result is obtained by combining two contributions:\n\n\u2022 The sensing matrix A is made Gaussian by choosing the operators Tm to be stochastic inte-\ngrals: Tmf def= 1\u221aM \ufffdC\nf dBm, where Bm are Brownian motions, and C is a 1-dimensional\ncurve of X appropriately chosen according to the functions {\u03d5k}k\u2264K (see the discussion\nin Section 4). We call A the Brownian sensing matrix.\n\nWe have the property that the recovery property using the Brownian sensing matrix A only depends\non the number of Brownian motions M used in the stochastic integrals and not on the number of\nsampled points N . Note that M can be chosen arbitrarily large as it is not linked with the limited\namount of data, but M affects the overall computational complexity of the method. The number of\nsample N appears in the quality of estimation of b only, and this is where the assumption that f is\nH\u00a8older continuous comes into the picture.\n\nOutline: In Section 2, we survey the large body of existing results about sparse recovery and relate\nour contribution to this literature. In Section 3, we explain in detail the Brownian sensing recovery\nmethod sketched above and state our main result in Theorem 4.\n\nIn Section 4, we \ufb01rst discuss our result and compare it with existing work. Then we comment on\nthe choice and in\ufb02uence of the sampling domain C on the recovery performance.\nFinally in Section 5, we report numerical experiments illustrating the recovery properties of the\nBrownian sensing method, and the bene\ufb01t of the latter compared to a straightforward application of\ncompressed sensing when there is noise and very few sampling points.\n\n1where the approximation sign \u2248 refers to a minimization problem under a constraint coming from the\n\nobservation noise.\n\n2\n\n\f2 Relation to existing results\n\nA standard approach in order to recover \u03b1 is to consider the N \u00d7 K matrix \u03a6 = (\u03d5k(xn))k,n, and\nsolve the system \u03a6\ufffd\u03b1 \u2248 y where y is the vector with components yn. Since N \ufffd K this is an ill-\nposed problem. Under the sparsity assumption, a successful idea is \ufb01rst to replace the initial problem\nwith the well-de\ufb01ned problem of minimizing the \ufffd0 norm of \u03b1 under the constraint that \u03a6\ufffd\u03b1 \u2248 y, and\n\nthen, since this problem is NP-hard, use convex relaxation of the \ufffd0 norm by replacing it with the \ufffd1\nnorm. We then need to ensure that the relaxation provides the same solution as the initial problem\nmaking use of the \ufffd0 norm. The literature on this problem is huge (see [3, 7, 8, 15, 18, 4, 11] for\nexamples of papers that initiated this \ufb01eld of research).\n\nGenerally, we can decompose the reconstruction problem into two distinct sub-problems. The \ufb01rst\nsub-problem (a) is to state conditions on the matrix \u03a6 ensuring that the recovery is possible and\nderive results for the estimation error under such conditions:\n\nThe \ufb01rst important condition is the Restricted Isometry Property (RIP), introduced in [5], from\nwhich we can derive the following recovery result stated in [6]:\n\n\u221aN\n\na\ufffd2\n\n, de\ufb01ned as \u03b4S =\n\ufffda\ufffd2 \u2212 1|;\ufffda\ufffd0 \u2264 S}. Then if \u03b43S + \u03b44S < 2, for every S-sparse vector \u03b1 \u2208 RK , the\n\nTheorem 1 (Cand\u00b4es & al, 2006) Let \u03b4S be the restricted isometry constant of \u03a6\u221aN\nsup{| \ufffd \u03a6\nsolution \ufffd\u03b1 to the \ufffd1-minimization problem min{\ufffda\ufffd1; a satis\ufb01es \ufffd\u03a6a \u2212 y\ufffd2\n\n2 \u2264 \u03c32} satis\ufb01es\n\nCS\u03c32\n\n,\n\nwhere CS depends only on \u03b44S.\n\n\ufffd\ufffd\u03b1 \u2212 \u03b1\ufffd2\n2 \u2264\n\nN\n\nApart from the historical RIP, many other conditions emerged from works reporting the practical\ndif\ufb01culty to have the RIP satis\ufb01ed, and thus weaker conditions ensuring reconstruction were derived.\nSee [17] for a precise survey of such conditions. A weaker condition for recovery is the compatibility\ncondition which leads to the following result from [16]:\n\nTheorem 2 (Van de Geer & Buhlmann, 2009) Assuming that the compatibility condition is satis-\n\ufb01ed, i.e. for a set S of indices of cardinality S and a constant L,\n\nC(L,S) = min\ufffd S\ufffd \u03a6\u221aN\n\u03b1\ufffd2\n\ufffd\u03b1S\ufffd2\n\n, \u03b1 satis\ufb01es \ufffd\u03b1S c\ufffd1 \u2264 L\ufffd\u03b1S\ufffd1\ufffd > 0,\nthen for every S-sparse vector \u03b1 \u2208 RK ,\nthe solution \ufffd\u03b1 to the \ufffd1-minimization problem\nmin{\ufffd\u03b1\ufffd1; \u03b1 satis\ufb01es \ufffd\u03b1S c\ufffd1 \u2264 L\ufffd\u03b1S\ufffd1} satis\ufb01es for C a numerical constant:\nC(L,S)2\n\n\u03c32 log(K)\n\nN\n\nC\n\n2\n\n1\n\n.\n\nThe second sub-problem (b) of the global reconstruction problem is to provide the user with a\nsimple way to ef\ufb01ciently sample the space in order to build a matrix \u03a6 such that the conditions\nfor recovery are ful\ufb01lled, at least with high probability. This can be dif\ufb01cult in practice since it\ninvolves understanding the geometry of high dimensional objects. For instance, to the best of our\nknowledge, there is no result explaining how to sample the space so that the corresponding sensing\nmatrix \u03a6 satis\ufb01es the nice recovery properties needed by the previous theorems, for a general family\nof features {\u03d5k}k\u2264K .\nHowever, it is proven in [12] that under some hypotheses on the functional basis, we are able to\nrecover the strong RIP property for the matrix \u03a6 with high probability. This result, combined with a\nrecovery result, is stated as follows:\n\n\ufffd\ufffd\u03b1 \u2212 \u03b1\ufffd2\n2 \u2264\n\nTheorem 3 (Rauhut, 2010) Assume that {\u03d5k}k\u2264K is an orthonormal basis of functions under a\nmeasure \u03bd, bounded by a constant C\u03d5, and that we build DN by sampling f at random according\n\u03d5S log(S)2 log(K) and\nlog(N ) \u2265 c0C 2\nto \u03bd. Assume also that the noise is bounded \ufffd\u03b7\ufffd2 \u2264 \u03c3. If\nN \u2265 c1C 2\n\u03d5S log(p\u22121), then with probability at least 1 \u2212 p, for every S-sparse vector \u03b1 \u2208 RK , the\nsolution \ufffd\u03b1 to the \ufffd1-minimization problem min{\ufffda\ufffd1; a satis\ufb01es \ufffdAa \u2212 y\ufffd2\n\n2 \u2264 \u03c32} satis\ufb01es\n\nN\n\nc2\u03c32\nN\n\n,\n\n\ufffd\ufffd\u03b1 \u2212 \u03b1\ufffd2\n2 \u2264\n\n3\n\n\fwhere c0, c1 and c2 are some numerical constants.\n\nIn order to prove this theorem, the author of [12] showed that by sampling the points i.i.d. from \u03bd,\nthen with with high probability the resulting matrix \u03a6 is RIP. The strong point of this Theorem is that\nwe do not need to check conditions on the matrix \u03a6 to guarantee that it is RIP, which is in practice\ninfeasible. But the weakness of the result is that the initial basis has to be orthonormal and bounded\nunder the given measure \u03bd in order to get the RIP satis\ufb01ed: the two conditions ensure incoherence\nwith Dirac observation basis. The speci\ufb01c case of an unbounded basis i.e., Legendre Polynomial\nbasis, has been considered in [13], but to the best of our knowledge, the problem of designing a\ngeneral sampling strategy such that the resulting sensing matrix possesses nice recovery properties\nin the case of non-orthonormal basis remains unaddressed. Our contribution considers this case and\nis described in the following section.\n\n3 The \u201cBrownian sensing\u201d approach\n\nA need for incoherence. When the representation and observation basis are not incoherent, the\nsensing matrix \u03a6 does not possess a nice recovery property. A natural idea is to change the observa-\ntion basis by introducing a set of M linear operators {Tm}m\u2264M acting on the functions {\u03d5k}k\u2264K .\nWe have Tm(f ) =\n\u03b1kTm(\u03d5k) for all 1 \u2264 m \u2264 M and our goal is to de\ufb01ne the operators\n{Tm}m\u2264M in order that the sensing matrix (Tm(\u03d5k))m,k enjoys a nice recovery property, whatever\nthe representation basis {\u03d5k}k\u2264K .\n\nK\ufffdk=1\n\nThe Brownian sensing operators. We now consider linear operators de\ufb01ned by stochastic inte-\ngrals on a 1-dimensional curve C of X . First, we need to select a curve C \u2282 X of length l, such\nthat the covariance matrix VC, de\ufb01ned by its elements (VC)i,j = \ufffdC\n\u03d5i\u03d5j (for 1 \u2264 i, j \u2264 K), is\ninvertible. We will discuss the existence of a such a curve later in Section 4. Then, we de\ufb01ne the\nlinear operators {Tm}1\u2264m\u2264M as stochastic integrals over the curve C: Tm(g) def= 1\u221aM \ufffdC\ngdBm,\nwhere {Bm}m\u2264M are M independent Brownian motions de\ufb01ned on C.\nNote that up to an appropriate speed-preserving parametrization g :\n[0, l] \u2192 X of C, we can\nwork with the corresponding induced family {\u03c8k}k\u2264K , where \u03c8k = \u03d5k \u25e6 g, instead of the fam-\nily {\u03d5k}k\u2264K .\nThe sensing method. With the choice of the linear operators {Tm}m\u2264M de\ufb01ned above, the param-\neter \u03b1 \u2208 RK now satis\ufb01es the following equation\n\nA\u03b1 = b ,\n\n(2)\n\ndef= Tm(f ) = 1\u221aM \ufffdC\nf (x)dBm(x) and the so-\nwhere b \u2208 RM is de\ufb01ned by its components bm\ndef= Tm(\u03d5k). Note that we\ncalled Brownian sensing matrix A (of size M \u00d7 K) has elements Am,k\ndo not require sampling f in order to compute the elements of A. Thus, the samples only serve for\nestimating b and for this purpose, we sample f at points {xn}1\u2264n\u2264N regularly chosen along the\ncurve C.\nIn general, for a curve C parametrized with speed-preserving parametrization g : [0, l] \u2192 X of C,\nwe have xn = g( n\n\nN l) and the resulting estimate\ufffdb \u2208 RM of b is de\ufb01ned with components:\n\nyn(Bm(xn+1) \u2212 Bm(xn)) .\n\n(3)\n\n1\n\u221aM\n\nN\u22121\ufffdn=0\n\n\ufffdbm =\n\nNote that in the special case when X = C = [0, 1], we simply have xn = n\nN .\nThe \ufb01nal step of the proposed method is to apply standard recovery techniques (e.g., l1 minimization\nor Lasso) to compute \ufffd\u03b1 for the system (2) where b is perturbed by the so-called sensing noise\n\u03b5 def= b \u2212\ufffdb (estimation error of the stochastic integrals).\n\n4\n\n\f3.1 Properties of the transformed objects\n\nWe now give two properties of the Brownian sensing matrix A and the sensing noise \u03b5 = b \u2212\ufffdb .\nBrownian sensing matrix. By de\ufb01nition of the stochastic integral operators {Tm}m\u2264M , the sensing\nmatrix A = (Tm(\u03d5k))m,k is a centered Gaussian matrix, with\n\nCov(Am,k, Am,k\ufffd ) =\n\n\u03d5k(x)\u03d5k\ufffd (x)dx .\n\n1\n\nM \ufffdC\n\nMoreover by independence of the Brownian motions, each row Am,\u00b7 is i.i.d. from a centered Gaus-\nsian distribution N (0, 1\nM VC), where VC is the K \u00d7 K covariance matrix of the basis, de\ufb01ned by its\nelements Vk,k\ufffd =\ufffdC\n\u03d5k(x)\u03d5k\ufffd (x)dx. Thanks to this nice structure, we can prove that A possesses a\n\nproperty similar to RIP (in the sense of [10]) whenever M is large enough:\n\nProposition 1 For p > 0 and any integer t > 0, when M > C\ufffd\n4 (t log(K/t) + log 1/p)), with C\ufffd\nbeing a universal constant (de\ufb01ned in [14, 1]), then with probability at least 1 \u2212 p, for all t\u2212sparse\nvectors x \u2208 RK ,\n\n1\n2\n\n\u03bdmin,C\ufffdx\ufffd2 \u2264 \ufffdAx\ufffd2 \u2264\n\n\u03bdmax,C\ufffdx\ufffd2,\n\n3\n2\n\nwhere \u03bdmax,C and \u03bdmin,C are respectively the largest and smallest eigenvalues of V 1/2\nSensing noise. In order to state our main result, we need a bound on \ufffd\u03b5\ufffd2\n2. We consider the simplest\ndeterministic sensing design where we choose the sensing points to be uniformly distributed along\nthe curve C2.\nProposition 2 Assume that \ufffd\u03b7\ufffd2\n\n2 \u2264 \u03c32 and that f is (L, \u03b2)-H\u00a8older, i.e.\n\nC\n\n.\n\n\u2200(x, y) \u2208 X 2,|f (x) \u2212 f (y)| \u2264 L|x \u2212 y|\u03b2 ,\n\nthen for any p \u2208 (0, 1], with probability at least 1 \u2212 p, we have the following bound on the sensing\nnoise \u03b5 = b \u2212\ufffdb:\n\n\u02dc\u03c32(N, M, p)\n\n,\n\n\ufffd\u03b5\ufffd2\n2 \u2264\nN 2\u03b2\u22121 + \u03c32\ufffd\ufffd1 + 2\n\u02dc\u03c32(N, M, p) def= 2\ufffd L2l2\u03b2\n\nN\n\nlog(1/p)\n\nM\n\n+ 4\ufffd log(1/p)\nM \ufffd .\n\nwhere\n\nRemark 1 The bound on the sensing noise \ufffd\u03b5\ufffd2\n2 contains two contributions: an approximation\nerror term which comes from the approximation of a stochastic integral with N points and that\nscales with L2l2\u03b2/N 2\u03b2, and the observation noise term of order \u03c32/N . The observation noise term\n(when \u03c32 > 0) dominates the approximation error term whenever \u03b2 \u2265 1/2.\n\n3.2 Main result.\nIn this section, we state our main recovery result for the Brownian sensing method, described in\n\nFigure 1, using a uniform sampling method along a one-dimensional curve C \u2282 X \u2282 Rd. The proof\n\nof the following theorem can be found in the supplementary material.\n\nTheorem 4 (Main result) Assume that f is (L, \u03b2)-H\u00a8older on X and that VC is invertible. Let us\nwrite the condition number \u03baC = \u03bdmax,C/\u03bdmin,C, where \u03bdmax,C and \u03bdmin,C are respectively the\nlargest and smallest eigenvalues of V 1/2\n. For any p \u2208 (0, 1], let\nM \u2265 4c(4Sr log( K\n4Sr ) + log 1/p) (where c is a universal constant de\ufb01ned in [14, 1]). Then, with\nprobability at least 1 \u2212 3p, the solution \ufffd\u03b1 obtained by the Brownian sensing approach described in\n\n. Write r = \ufffd(3\u03baC \u2212 1)(\n\nFigure 1, satis\ufb01es\n\n4\u221a2\u22121\n\n)\ufffd2\n\nC\n\n1\n\nwhere C is a numerical constant and \u02dc\u03c3(N, M, p) is de\ufb01ned in Proposition 2.\n\n\ufffd\ufffd\u03b1 \u2212 \u03b1\ufffd2\n\n2 \u2264 C\ufffd\n\n\u03ba4\nC\n\nmaxk\ufffdC\n\nk\ufffd \u02dc\u03c32(N, M, p)\n\nN\n\n\u03d52\n\n,\n\nNote that a similar result (not reported in this conference paper) can be proven in the case of\ni.i.d. sub-Gaussian noise, instead of a noise with bounded \ufffd2 norm considered here.\n\n2Note that other deterministic, random, or low-discrepancy sequence could be used here.\n\n5\n\n\fInput: a curve C of length l such that VC is invertible. Parameters N and M .\n\n\u2022 Select N uniform samples {xn}1\u2264n\u2264N along the curve C,\n\u2022 Generate M Brownian motions {Bm}1\u2264m\u2264M along C.\n\u2022 Compute the Brownian sensing matrix A \u2208 RM\u00d7K\n\n\u03d5k(x)dBm(x)).\n\n(i.e. Am,k = 1\u221aM \ufffdC\n(i.e.\ufffdbm = 1\u221aM \ufffdN\u22121\n\n\u2022 Compute the estimate\ufffdb \u2208 RM\n\u2022 Find \ufffd\u03b1, solution to\n\nn=0 yn(Bm(xn+1) \u2212 Bm(xn))).\n\nmin\n\na \ufffd \ufffda\ufffd1 such that \ufffdAa \u2212\ufffdb\ufffd2\n\n2 \u2264\n\n\u02dc\u03c32(N, M, p)\n\nN\n\n\ufffd .\n\nFigure 1: The Brownian sensing approach using a uniform sampling along the curve C.\n\n4 Discussion.\n\nIn this section we discuss the differences with previous results, especially with the work [12] recalled\nin Theorem 3. We then comment on the choice of the curve C and illustrate examples of such curves\nfor different bases.\n\n4.1 Comparison with known results\n\n2 = O( L2l2\u03b2\n\nN 2\u03b2 + \u03c32\n\n2 = O( \u03c32\n\nThe order of the bound. Concerning the scaling of the estimation error in terms of the number\nof sensing points N , Theorem 3 of [12] (reminded in Section 2) states that when N is large enough\nN ). In comparison,\nN ) for any values of N . Thus, provided that the\n\n(i.e., N = \u03a9(S log(K))), we can build an estimate \ufffd\u03b1 such that \ufffd\ufffd\u03b1 \u2212 \u03b1\ufffd2\nour bound shows that \ufffd\ufffd\u03b1 \u2212 \u03b1\ufffd2\nfunction f has a H\u00a8older exponent \u03b2 \u2265 1/2, we obtain the same rate as in Theorem 3.\nA weak assumption about the basis. Note that our recovery performance scales with the condi-\ntion number \u03baC of VC as well as the length l of the curve C. However, concerning the hypothesis\non the functions {\u03d5k}k\u2264K , we only assume that the covariance matrix VC is invertible on the curve\nC, which enables to handle arbitrarily non-orthonormal bases. This means that the orthogonality\ncondition on the basis functions is not a crucial requirement to deduce sparse recovery properties.\nTo the best of our knowledge, this is an improvement over previously known results (such as the\nwork of [12]). Note however that if \u03baC or l are too high, then the bound becomes loose. Also the\ncomputational complexity of the Brownian sensing increases when \u03baC is large, since it is necessary\nto take a large M , i.e. to simulate more Brownian motions in that case.\n\nA result that holds without any conditions on the number of sampling points. Theorem 4\nrequires a constraint on the number of Brownian motions M (i.e., that M = \u03a9(S log K)) and not\non the number of sampling points N (as in [12], see Theorem 3). This is interesting in practical\nsituations when we do not know the value of S, as we do not have to assume a lower-bound on N\nto deduce the estimation error result. This is due to the fact that the Brownian sensing matrix A\nonly depends on the computation of the M stochastic integrals of the K functions \u03d5k, and does not\ndepend on the samples. The bound shows that we should take M as large as possible. However, M\nimpacts the numerical cost of the method. This implies in practice a trade-off between a large M\nfor a good estimation of \u03b1 and a low M for low numerical cost.\n\n4.2 The choice of the curve\nWhy sampling along a 1-dimensional curve C instead of sampling over the whole space X ? In\na bounded space X of dimension 1, both approaches are identical. But in dimension d > 1, following\nthe Brownian sensing approach while sampling over the whole space would require generating M\nBrownian sheets (extension of Brownian motions to d > 1 dimensions) over X , and then building\n\n6\n\n\f1 (t1)...dBm\n\n\u03d5k(t1, ...td)dBm\n\nthe M \u00d7 K matrix A with elements Am,k = \ufffdX\nd (td). Assuming that\nthe covariance matrix VX is invertible, this Brownian sensing matrix is also Gaussian and enjoys\nthe same recovery properties as in the one-dimensional case. However, in this case, estimating\nthe stochastic integrals bm = \ufffdX\nf dBm using sensing points along a (d-dimensional) grid would\nprovide an estimation error \u03b5 = b\u2212\ufffdb that scales poorly with d since we integrate over a d dimensional\nspace. This explains our choice of selecting a 1-dimensional curve C instead of the whole space X\nand sampling N points along the curve. This choice provides indeed a better estimation of b which\nis de\ufb01ned by a 1-dimensional stochastic integrals over C. Note that the only requirement for the\nchoice of the curve C is that the covariance matrix VC de\ufb01ned along this curve should be invertible.\nIn addition, in some speci\ufb01c applications the sampling process can be very constrained by physical\nsystems and sampling uniformly in all the domain is typically costly. For example in some medical\nexperiments, e.g., scanner or I.R.M., it is only possible to sample along straight lines.\n\nIn the result of Theorem 4, the length l of\nWhat the parameters of the curve tell us on a basis.\nthe curve C as well as the condition number \u03baC = \u03bdmax,C/\u03bdmin,C are essential characteristics of the\nef\ufb01ciency of the method. It is important to note that those two variables are actually related. Indeed,\nit may not be possible to \ufb01nd a short curve C such that \u03baC is small. For instance in the case where\nthe basis functions have compact support, if the curve C does not pass through the support of all\nfunctions, VC will not be invertible. Any function whose support does not intersect with the curve\nwould indeed be an eigenvector of VC with a 0 eigenvalue. This indicates that the method will not\nwork well in the case of a very localized basis {\u03d5k}k\u2264K (e.g. wavelets with compact support), since\nthe curve would have to cover the whole domain and thus l will be very large. On the other hand,\nthe situation may be much nicer when the basis is not localized, as in the case of a Fourier basis.\nWe show in the next subsection that in a d-dimensional Fourier basis, it is possible to \ufb01nd a curve C\n(actually a segment) such that the basis is orthonormal along the chosen line (i.e. \u03baC = 1).\n\n4.3 Examples of curves\nFor illustration, we exhibit three cases for which one can easily derive a curve C such that VC is\ninvertible. The method described in the previous section will work with the following examples.\n\nIn this case, we simply take C = X , and the sparse recovery is possible\n\nX is a segment of R:\nwhenever the functions {\u03d5k}k\u2264K are linearly independent in L2.\nCoordinate functions: Consider\nthe case when the basis are the coordinate functions\nThen we can de\ufb01ne the parametrization of the curve C by g(t) =\n\u03d5k(t1, ...td) = tk.\n\u03b1(t)(t, t2, . . . , td), where \u03b1(t) is the solution to a differential equation such that \ufffdg\ufffd(t)\ufffd2 = 1 (which\nimplies that for any function h,\ufffd h \u25e6 g = \ufffdC\nh). The corresponding functions \u03c8k(t) = \u03b1(t)tk are\nlinearly independent, since the only functions \u03b1(t) such that the {\u03c8k}k\u2264K are not linearly indepen-\ndent are functions that are 0 almost everywhere, which would contradict the de\ufb01nition of \u03b1(t). Thus\nVC is invertible.\nFourier basis: Let us now consider the Fourier basis in Rd with frequency T :\n\n\u03d5n1,...,nd (t1, .., td) =\ufffdj\n\nexp\ufffd \u2212\n\n2i\u03c0njtj\n\nT\n\n\ufffd,\n\nwhere nj \u2208 {0, ..., T \u2212 1} and tj \u2208 [0, 1]. Note that this basis is orthonormal under the uniform\ndistribution on [0, 1]d. In this case we de\ufb01ne g by g(t) = \u03bb(t\nT d\u22121 ) with \u03bb =\n\ufffd 1\u2212T \u22122\n1\u2212T \u22122d (so that \ufffdg\ufffd(t)\ufffd2 = 1), thus we deduce that:\n\nT d\u22121 , ..., t T d\u22121\n\nT d\u22121 , t T\n\n1\n\nSince nk \u2208 {0, ..., T \u2212 1}, the mapping that associates \ufffdj njT j\u22121 to (n1, . . . , nd) is a bijection\nfrom {0, . . . , T \u2212 1}d to {0, . . . , T d \u2212 1}. Thus we can identify the family (\u03c8n1,...,nd ) with the one\ndimensional Fourier basis with frequency T d\n\u03bb , which means that the condition number \u03c1 = 1 for\nthis curve. Therefore, for a d-dimensional function f , sparse in the Fourier basis, it is suf\ufb01cient to\nsample along the curve induced by g to ensure that VC is invertible.\n\n\u03c8n1,...,nd (t) = exp\ufffd \u2212\n\n2i\u03c0t\u03bb\ufffdj njT j\u22121\n\nT d\n\n\ufffd.\n\n7\n\n\f5 Numerical Experiments\n\nIn this section, we illustrate the method of Brownian sensing in dimension one. We consider\na non-orthonormal family {\u03d5k}k\u2264K of K = 100 functions of L2([0, 2\u03c0]) de\ufb01ned by \u03d5k(t) =\n. In the experiments, we use a function f whose decomposition is 3-sparse and\nwhich is (10, 1)-H\u00a8older, and we consider a bounded observation noise \u03b7, with different noise levels,\n\ncos(tk)+cos(t(k+1))\n\n\u221a2\u03c0\n\nwhere the noise level is de\ufb01ned by \u03c32 =\ufffdN\n\nn=1 \u03b72\nn.\n\nComparison of l1\u2212minimization and Brownian Sensing\n\nComparison of l1\u2212minimization and Brownian Sensing\n\nComparison of l1\u2212minimization and Brownian Sensing\n\n200\n\n180\n\n160\n\n140\n\n120\n\n100\n\n80\n\n60\n\n40\n\n20\n\nr\no\nr\nr\ne\n\n \nc\ni\nt\na\nr\nd\na\nu\nq\nn\na\ne\nm\n\n \n\n0\n5\n\n10\n\n15\n\nwith noise variance 0\n\n20\n\n25\n\n30\n\n35\n\nnumber of sampling points\n\nr\no\nr\nr\ne\n\n \nc\ni\nt\na\nr\nd\na\nu\nq\nn\na\ne\n\n \n\nM\n\n220\n\n200\n\n180\n\n160\n\n140\n\n120\n\n100\n\n80\n\n60\n\n40\n5\n\n10\n\n15\n\n40\n\n45\n\n50\n\nwith noise variance 0.5\n\n20\n\n25\n\n30\n\n35\n\nNumber of sampling points\n\n220\n\n200\n\n180\n\n160\n\n140\n\n120\n\nr\no\nr\nr\ne\n\n \nc\ni\nt\na\nr\nd\na\nu\nq\nn\na\ne\nm\n\n \n\n40\n\n45\n\n50\n\n100\n5\n\n10\n\n15\n\nwith noise variance 1\n\n20\n\n25\n\n30\n\nnumber of sampling points\n\n35\n\n40\n\n45\n\n50\n\nFigure 2: Mean squared estimation error using Brownian sensing (plain curve) and a direct l1-\nminimization solving \u03a6\u03b1 \u2248 y (dashed line), for different noise level (\u03c32 = 0, \u03c32 = 0.5, \u03c32 = 1),\nplotted as a function of the number of sample points N .\n\nIn Figure 2, the plain curve represents the recovery performance, i.e., mean squared error, of Brow-\n\nnian sensing i.e., minimizing \ufffda\ufffd1 under constraint that \ufffdAa \u2212\ufffdb\ufffd2 \u2264 1.95\ufffd2(100/N + 2) using\n\nM = 100 Brownian motions and a regular grid of N points, as a function of N 3. The dashed curve\nrepresents the mean squared error of a regular l1 minimization of \ufffda\ufffd1 under the constraint that\n\ufffd\u03a6a \u2212 y\ufffd2\n2 \u2264 \u03c32 (as described e.g. in [12]), where the N samples are drawn uniformly randomly\nover the domain. The three different graphics correspond to different values of the noise level \u03c32\n(from left to right 0, 0.5 and 1). Note that the results are averaged over 5000 trials.\n\nFigure 2 illustrates that, as expected, Brownian sensing outperforms the method described in [12]\nfor noisy measurements4. Note also that the method described in [12] recovers the sparse vector\nwhen there is no noise, and that Brownian sensing in this case has a smoother dependency w.r.t. N .\nNote that this improvement comes from the fact that we use the H\u00a8older regularity of the function:\nCompressed sensing may outperform Brownian sensing for arbitrarily non regular functions.\n\nConclusion\n\nIn this paper, we have introduced a so-called Brownian sensing approach, as a way to sample an un-\nknown function which has a sparse representation on a given non-orthonormal basis. Our approach\ndiffers from previous attempts to apply compressed sensing in the fact that we build a \u201cBrownian\nsensing\u201d matrix A based on a set of Brownian motions, which is independent of the function f . This\nenables us to guarantee nice recovery properties of A. The function evaluations are used to estimate\nthe right hand side term b (stochastic integrals). In dimension d we proposed to sample the function\nalong a well-chosen curve, i.e. such that the corresponding covariance matrix is invertible. We pro-\n\nvided competitive reconstruction error rates of order O(\ufffd\u03b7\ufffd2/\u221aN ) when the observation noise \u03b7 is\n\nbounded and f is assumed to be H\u00a8older continuous with exponent at least 1/2. We believe that the\nH\u00a8older assumption is not strictly required (the smoothness of f is assumed to derive nice estimations\nof the stochastic integrals only), and future works will consider weakening this assumption, possibly\nby considering randomized sampling designs.\n\n3We assume that we know a loose bound on the noise level, here \u03c32 \u2264 2, and we take p = 0.01.\n4Note however that there is no theoretical guarantee that the method described in [12] works here since the\n\nfunctions are not orthonormal.\n\n8\n\n\fAcknowledgements\n\nThis research was partially supported by the French Ministry of Higher Education and Research,\nNord- Pas-de-Calais Regional Council and FEDER through CPER 2007-2013, ANR projects\nEXPLO-RA (ANR-08-COSI-004) and Lampada (ANR-09-EMER-007), by the European Com-\nmunitys Seventh Framework Programme (FP7/2007-2013) under grant agreement 231495 (project\nCompLACS), and by Pascal-2.\n\nReferences\n\n[1] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin. A simple proof of the restricted isom-\n\netry property for random matrices. Constructive Approximation, 28(3):253\u2013263, 2008.\n\n[2] G. Bennett. Probability inequalities for the sum of independent random variables. Journal of\n\nthe American Statistical Association, 57(297):33\u201345, 1962.\n\n[3] E. Cand`es and J. Romberg. Sparsity and incoherence in compressive sampling. Inverse Prob-\n\nlems, 23:969\u2013985, 2007.\n\n[4] E. Candes and T. Tao. The Dantzig selector: statistical estimation when p is much larger than\n\nn. Annals of Statistics, 35(6):2313\u20132351, 2007.\n\n[5] E.J. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruc-\ntion from highly incomplete frequency information. IEEE Transactions on information theory,\n52(2):489\u2013509, 2006.\n\n[6] E.J. Cand`es, J.K. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate\n\nmeasurements. Communications on Pure and Applied Mathematics, 59(8):1207, 2006.\n\n[7] D.L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289\u2013\n\n1306, 2006.\n\n[8] D.L. Donoho and P.B. Stark. Uncertainty principles and signal recovery. SIAM Journal on\n\nApplied Mathematics, 49(3):906\u2013931, 1989.\n\n[9] M. Fornasier and H. Rauhut. Compressive Sensing.\nMathematical Methods in Imaging. Springer, to appear.\n\nIn O. Scherzer, editor, Handbook of\n\n[10] S. Foucart and M.J. Lai.\n\nSparsest solutions of underdetermined linear systems via lq-\nminimization for 0 < q < p. Applied and Computational Harmonic Analysis, 26(3):395\u2013407,\n2009.\n\n[11] V. Koltchinskii. The Dantzig selector and sparsity oracle inequalities. Bernoulli, 15(3):799\u2013\n\n828, 2009.\n\n[12] H. Rauhut. Compressive Sensing and Structured Random Matrices. Theoretical Foundations\n\nand Numerical Methods for Sparse Recovery, 9, 2010.\n\n[13] H. Rauhut and R. Ward. Sparse legendre expansions via l1 minimization. Arxiv preprint\n\narXiv:1003.0251, 2010.\n\n[14] M. Rudelson and R. Vershynin. On sparse reconstruction from Fourier and Gaussian measure-\n\nments. Communications on Pure and Applied Mathematics, 61(8):1025\u20131045, 2008.\n\n[15] Robert Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal\n\nStatistical Society, Series B, 58:267\u2013288, 1994.\n\n[16] Sara A. van de Geer. The deterministic lasso. Seminar f\u00a8ur Statistik, Eidgen\u00a8ossische Technische\n\nHochschule (ETH) Z\u00a8urich, 2007.\n\n[17] Sara A. van de Geer and Peter Buhlmann. On the conditions used to prove oracle results for\n\nthe lasso. Electronic Journal of Statistics, 3:1360\u20131392, 2009.\n\n[18] P. Zhao and B. Yu. On model selection consistency of Lasso. The Journal of Machine Learning\n\nResearch, 7:2563, 2006.\n\n9\n\n\f", "award": [], "sourceid": 1005, "authors": [{"given_name": "Alexandra", "family_name": "Carpentier", "institution": null}, {"given_name": "Odalric-ambrym", "family_name": "Maillard", "institution": null}, {"given_name": "R\u00e9mi", "family_name": "Munos", "institution": null}]}