{"title": "Global Geometry of Multichannel Sparse Blind Deconvolution on the Sphere", "book": "Advances in Neural Information Processing Systems", "page_first": 1132, "page_last": 1143, "abstract": "Multichannel blind deconvolution is the problem of recovering an unknown signal $f$ and multiple unknown channels $x_i$ from convolutional measurements $y_i=x_i \\circledast f$ ($i=1,2,\\dots,N$). We consider the case where the $x_i$'s are sparse, and convolution with $f$ is invertible. Our nonconvex optimization formulation solves for a filter $h$ on the unit sphere that produces sparse output $y_i\\circledast h$. Under some technical assumptions, we show that all local minima of the objective function correspond to the inverse filter of $f$ up to an inherent sign and shift ambiguity, and all saddle points have strictly negative curvatures. This geometric structure allows successful recovery of $f$ and $x_i$ using a simple manifold gradient descent algorithm with random initialization. Our theoretical findings are complemented by numerical experiments, which demonstrate superior performance of the proposed approach over the previous methods.", "full_text": "Global Geometry of Multichannel\n\nSparse Blind Deconvolution on the Sphere\n\nYanjun Li\n\nCSL and Department of ECE\n\nUniversity of Illinois\nUrbana-Champaign\n\nYoram Bresler\n\nCSL and Department of ECE\n\nUniversity of Illinois\nUrbana-Champaign\n\nyli145@illinois.edu\n\nybresler@illinois.edu\n\nAbstract\n\nMultichannel blind deconvolution is the problem of recovering an unknown signal\nf and multiple unknown channels xi from convolutional measurements yi = xi(cid:126)f\n(i = 1, 2, . . . , N). We consider the case where the xi\u2019s are sparse, and convolution\nwith f is invertible. Our nonconvex optimization formulation solves for a \ufb01lter\nh on the unit sphere that produces sparse output yi (cid:126) h. Under some technical\nassumptions, we show that all local minima of the objective function correspond\nto the inverse \ufb01lter of f up to an inherent sign and shift ambiguity, and all saddle\npoints have strictly negative curvatures. This geometric structure allows successful\nrecovery of f and xi using a simple manifold gradient descent algorithm with\nrandom initialization. Our theoretical \ufb01ndings are complemented by numerical\nexperiments, which demonstrate superior performance of the proposed approach\nover the previous methods.\n\n1\n\nIntroduction\n\nBlind deconvolution, which aims to recover unknown vectors x and f from their convolution y =\nx (cid:126) f, has been extensively studied, especially in the context of image deblurring [1, 2, 3]. Recently,\nalgorithms with theoretical guarantees have been proposed for single channel blind deconvolution\n[4, 5, 6, 7, 8, 9, 10]. In order for the problem to be well-posed, these previous methods assume\nthat both x and f are constrained, to either reside in a known subspace or be sparse over a known\ndictionary [11, 12]. However, these methods cannot be applied if f (or x) is unconstrained, or does\nnot have a subspace or sparsity structure.\nIn many applications in communications [13], imaging [14], and computer vision [15], convolutional\nmeasurements yi = xi (cid:126) f are taken between a single signal (resp. \ufb01lter) f and multiple \ufb01lters (resp.\nsignals) {xi}N\ni=1. We call such problems multichannel blind deconvolution (MBD). Importantly, in\nthis multichannel setting, one can assume that only {xi}N\ni=1 are structured, and f is unconstrained.\nWhile there has been abundant work on single channel blind deconvolution (with both f and x\nconstrained), research on MBD (with f unconstrained) is relatively limited. Traditional MBD works\nassumed that the channels xi\u2019s are FIR \ufb01lters [16, 17, 18] or IIR \ufb01lters [19], and proposed to solve\nMBD using subspace methods. The problem is generally ill-conditioned, and the recovery using the\nsubspace methods is highly sensitive to noise [20].\nIn this paper, while retaining the unconstrained form of f, we consider a different structure of the\nmultiple channels {xi}N\ni=1: sparsity. The resulting problem is termed multichannel sparse blind\ndeconvolution (MSBD). The sparsity structure arises in many real-world applications.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fOpportunistic underwater acoustics: Underwater acoustic channels are sparse in nature [21].\nEstimating such sparse channels with an array of receivers using opportunistic sources (e.g., shipping\nnoise) involves a blind deconvolution problem with multiple unknown sparse channels [22, 23].\nRe\ufb02ection seismology: Thanks to the layered earth structure, re\ufb02ectivity in seismic signals is sparse.\nIt is of great interest to simultaneous recover the \ufb01lter (also known as the wavelet), and seismic\nre\ufb02ectivity along the multiple propagation paths between the source and the geophones [24].\nFunctional MRI: Neural activity signals are composed of brief spikes and are considered sparse.\nHowever, observations via functional magnetic resonance imaging (fMRI) are distorted by convolving\nwith the hemodynamic response function. A blind deconvolution procedure can reveal the underlying\nneural activity [25].\nSuper-resolution \ufb02uorescence microscopy: In super-resolution \ufb02uorescence microscopic imaging,\nphotoswitchable probes are activated stochastically to create multiple sparse images and allow\nmicroscopy of nanoscale cellular structures [26, 27]. One can further improve the resolution via a\ncomputational deconvolution approach, which mitigates the effect of the point spread function (PSF)\nof the microscope [28]. It is sometimes dif\ufb01cult to obtain the PSF (e.g., due to unknown aberrations),\nand one needs to jointly estimate the microscopic images and the PSF [29].\nPrevious approaches to MSBD have provided ef\ufb01cient iterative algorithms to compute maximum\nlikelihood (ML) estimates of parametric models of the channels {xi}N\ni=1 [23], or maximum a\nposteriori (MAP) estimates in various Bayesian frameworks [24, 15]. However, these algorithms\nusually do not have theoretical guarantees. Recently, guaranteed algorithms for MSBD have been\ndeveloped. Wang and Chi [30] proposed a convex formulation of MSBD based on (cid:96)1 minimization.\nLi et al. [31] solved a nonconvex formulation using projected gradient descent, and proposed an\ninitialization algorithm to compute a suf\ufb01ciently good starting point. However, the theoretical\nguarantees of these algorithms require restrictive assumptions (e.g., f has one dominant entry that is\nsigni\ufb01cantly larger than other entries [30], or f has an approximately \ufb02at spectrum [31]).\nWe would like to emphasize that, while earlier papers on MBD [16, 17, 18, 19] consider a linear con-\nvolution model, more recent guaranteed methods for MSBD [30, 31] consider a circular convolution\nmodel. By zero padding the signal and the \ufb01lter, one can rewrite a linear convolution as a circular\nconvolution. In practice, circular convolution is often used to approximate a linear convolution when\nthe \ufb01lter has a compact support or decays fast [32], and the signal has \ufb01nite length or satis\ufb01es a\ncircular boundary condition [1]. The accelerated computation of circular convolution via the fast\nFourier transform (FFT) is especially bene\ufb01cial in 2D or 3D applications [1, 29]. Multichannel blind\ndeconvolution with a circular convolution model is also related to blind gain and phase calibration\nwith Fourier measurements [33, 34, 35, 36, 37].\nIn this paper, we consider MSBD with circular convolution. In addition to the sparsity prior on the\nchannels {xi}N\ni=1, we impose, without loss of generality, the constraint that f has unit (cid:96)2 norm, i.e., f\nis on the unit sphere. (This eliminates the scaling ambiguity inherent in the MBD problem.) We show\nthat our sparsity promoting objective function has a nice geometric landscape on the the unit sphere:\n(S1) all local minima correspond to signed shifted versions of the desired solution, and (S2) the\nobjective function is strongly convex in neighborhoods of the local minima, and has strictly negative\ncurvature directions in neighborhoods of local maxima and saddle points. Similar geometric analysis\nhas been conducted for dictionary learning [38], phase retrieval [39], and single channel sparse blind\ndeconvolution [10]. Recently, Mei et al. [40] analyzed the geometric structure of the empirical risk\nof a class of machine learning problems (e.g., nonconvex binary classi\ufb01cation, robust regression, and\nGaussian mixture model). This paper is the \ufb01rst such analysis for MSBD.\nAlthough our analysis of global geometry shares a similar roadmap with previous works [10, 38, 39,\n40], much of our theoretical analysis is tailored for MSBD. For example, our partition of the unit\nsphere into three regions (of strong convexity, negative curvature, and large gradient, respectively) is\ncarefully crafted for our objective function, and is closely related to our error bound. We leverage\ntools that are commonly used in related works, such as concentration inequalities and union bounds,\nto prove the geometric properties. However, our bounds are derived speci\ufb01cally for MSBD, under\nnew assumptions. For example, the single channel sparse blind deconvolution [10] with sparse x,\nrequires f to have compact support. In contrast, in this work on MSBD, other than invertibility, we\nmake no assumptions on f.\n\n2\n\n\fProperties (S1) and (S2) allow simple manifold optimization algorithms to \ufb01nd the ground truth in\nthe nonconvex formulation. Unlike the second order methods in previous works [41, 39], we take\nadvantage of recent advances in the analysis of \ufb01rst-order methods [42, 43], and prove that a simple\nmanifold gradient descent algorithm, with random initialization and a \ufb01xed step size, can accurately\nrecover a signed shifted version of the ground truth in polynomial time almost surely. This is the \ufb01rst\nguaranteed algorithm for MSBD that does not rely on restrictive assumptions on f or {xi}N\nRecently, many optimization methods have been shown to escape saddle points of objective functions\nwith benign landscapes, e.g., gradient descent [44, 45], stochastic gradient descent [46], perturbed\ngradient descent [47], Natasha [48, 49], and FastCubic [50]. Similarly, optimization methods over\nRiemannian manifolds that can escape saddle points include manifold gradient descent [43], the trust\nregion method [41, 39], and the negative curvature method [51]. Our main result shows that these\nalgorithms can be applied to MSBD thanks to the favorable geometric structure of our objective.\n\ni=1.\n\n2 MSBD on the Sphere\n\ni=1 and f from {yi}N\n\n2.1 Problem Statement\nIn MSBD, the measurements y1, y2, . . . , yN \u2208 Rn are the circular convolutions of unknown sparse\nvectors x1, x2, . . . , xN \u2208 Rn and an unknown vector f \u2208 Rn, i.e., yi = xi (cid:126) f. In this paper, we\nsolve for {xi}n\ni=1. One can rewrite the measurement as Y = Cf X, where\nCf represents the circulant matrix whose \ufb01rst column is f, and Y = [y1, y2, . . . , yN ] and X =\n[x1, x2, . . . , xN ] are n \u00d7 N matrices. Without structures, one can solve the problem by choosing any\ninvertible circulant matrix Cf and compute X = C\u22121\nf Y . The fact that X is sparse narrows down the\nsearch space.\nEven with sparsity, the problem suffers from inherent scale and shift ambiguities. Suppose Sj :\nRn \u2192 Rn denotes a circular shift by j positions, i.e., Sj(x)(k) = x(k\u2212j) for j, k \u2208 [n]. Here we use\nx(j) to denote the j-th entry of x \u2208 Rn (treated as modulo n). Note that we have yi = xi (cid:126) f =\n(\u03b1Sj(xi)) (cid:126) (\u03b1\u22121S\u2212j(f )) for every nonzero \u03b1 \u2208 R and j \u2208 [n]. Therefore, MSBD has equivalent\nsolutions generated by scaling and circularly shifting {xi}n\nThroughout this paper, we assume that the circular convolution with the signal f is invertible, i.e.,\nthere exists a \ufb01lter g such that f (cid:126) g = e1 (the \ufb01rst standard basis vector). Equivalently, Cf is\nan invertible matrix, and the discrete Fourier transform (DFT) of f is nonzero everywhere. Since\nyi (cid:126) g = xi (cid:126) f (cid:126) g = xi, one can \ufb01nd g by solving the following optimization problem:\n\ni=1 and f.\n\n(P0) min\nh\u2208Rn\n\n1\nN\n\n(cid:107)Cyih(cid:107)0,\n\ns.t. h (cid:54)= 0.\n\nThe constraint eliminates the trivial solution that is 0. If the solution to MSBD is unique up to the\naforementioned ambiguities, then the only minimizers of (P0) are h = \u03b1Sjg (\u03b1 (cid:54)= 0, j \u2208 [n]).\n\n2.2 Smooth Formulation\n\nMinimizing the non-smooth (cid:96)0 \u201cnorm\u201d is usually challeng-\ning. Instead, one can choose a smooth surrogate function for\nsparsity. It is well-known that minimizing the (cid:96)1 norm can\nlead to sparse solutions [52]. An intuitive explanation is that\nthe sparse points on the unit (cid:96)2 sphere (which we call unit\nsphere from now on) have the smallest (cid:96)1 norm. As demon-\nstrated in Figure 1, these sparse points also have the largest (cid:96)4\nnorm. Therefore, maximizing the (cid:96)4 norm, a surrogate for the\n\u201cspikiness\u201d [53] of a vector, is akin to minimizing its sparsity.\nHere, we make two observations: (1) one can eliminate the\nscaling ambiguity by restricting h to the unit sphere Sn\u22121; (2) sparse recovery can be achieved by\nmaximizing (cid:107)\u00b7(cid:107)4\n\n4. Based on these observations, we adopt the following optimization problem:\n\nFigure 1: Unit (cid:96)1, (cid:96)2, and (cid:96)4 spheres\nin 2-D.\n\n(P1) min\nh\u2208Rn\n\n\u2212 1\n4N\n\n(cid:107)CyiRh(cid:107)4\n4,\n\ns.t. (cid:107)h(cid:107) = 1.\n\nN(cid:88)\n\ni=1\n\nN(cid:88)\n\ni=1\n\n3\n\n\f(cid:80)N\ni=1 C(cid:62)\n\nyi\n\ni=1, we explain how the preconditioner R works.\n\nCyi)\u22121/2 \u2208 Rn\u00d7n is a preconditioner, where \u03b8 is a parameter that\ni=1. In Section 3, under speci\ufb01c probabilistic assumptions\n\nThe matrix R := ( 1\n\u03b8nN\nis proportional to the sparsity level of {xi}N\non {xi}N\nProblem (P1) can be solved using \ufb01rst-order or second-order optimization methods over Riemannian\nmanifolds. The main result of this paper provides a geometric view of the objective function over the\nsphere Sn\u22121 (see Figure 3). We show that some off-the-shelf optimization methods can be used to\nobtain a solution \u02c6h close to a scaled and circularly shifted version of the ground truth. Speci\ufb01cally, \u02c6h\nsatis\ufb01es Cf R\u02c6h \u2248 \u00b1ej for some j \u2208 [n], i.e., R\u02c6h is approximately a signed and shifted version of the\ninverse of f. Given solution \u02c6h to (P1), one can recover f and xi (i = 1, 2, . . . , N) as follows:\n\n\u02c6f = F\u22121(cid:2)F(R\u02c6h)(cid:12)\u22121(cid:3),\n\n\u02c6xi = CyiR\u02c6h.\n\n(1)\n\nHere, we use x(cid:12)\u22121 to denote the entrywise inverse of x.\n\n3 Global Geometric View\nIn this paper, we assume that {xi}N\n(A1) The channels {xi}N\n\ni=1 are random sparse vectors, and f is invertible:\n\ni=1 follow a Bernoulli-Rademacher model. More precisely, xi(j) = AijBij\nare independent random variables, Bij\u2019s follow a Bernoulli distribution Ber(\u03b8), and Aij\u2019s\nfollow a Rademacher distribution (taking values 1 and \u22121, each with probability 1/2).\nnumber of f, which is de\ufb01ned as \u03ba := maxj |(F f )(j)|\nand smallest magnitudes of the DFT of f.\n\n(A2) The circular convolution with the signal f is invertible. We use \u03ba to denote the condition\n\u03c3n(Cf ), i.e., the ratio of the largest\n\nmink |(F f )(k)| = \u03c31(Cf )\n\nyi\n\nyi\n\nyi\n\nThe Bernoulli-Rademacher model is a special case of the Bernoulli\u2013sub-Gaussian models. The\nderivation in this paper can be repeated for other sub-Gaussian nonzero entries, with different tail\nbounds. We use the Rademacher distribution for simplicity.\n4(cid:107)x(cid:107)4\nLet \u03c6(x) = \u2212 1\n4. Its gradient and Hessian are de\ufb01ned by\nj, and H\u03c6(x)(jk) = \u22123x2\n\u2207\u03c6(x)(j) = \u2212x3\nj \u03b4jk. (We use H(jk)\nto denote the entry of H \u2208 Rn\u00d7n in the j-th row and k-th\n(cid:80)N\n(cid:80)N\ncolumn, and use \u03b4jk to denote the Kronecker delta.) Then\n(cid:80)N\nthe objective function in (P1) is L(h) = 1\ni=1 \u03c6(CyiRh),\nN\nCyi)\u22121/2. The gradient and\ni=1 C(cid:62)\nwhere R = ( 1\n(cid:80)N\n\u03b8nN\nHessian are \u2207L(h) = 1\n\u2207\u03c6(CyiRh), and\ni=1 R(cid:62)C(cid:62)\nN\ni=1 R(cid:62)C(cid:62)\nH\u03c6(Cyi Rh)Cyi R. Since L(h) is\nHL(h) = 1\nN\nto be minimized over Sn\u22121, we use optimization methods\nover Riemannian manifolds [54]. To this end, we de\ufb01ne the\ntangent space at h \u2208 Sn\u22121 as {z \u2208 Rn : z \u22a5 h} (see Figure\n2). We study the Riemannian gradient and Riemannian Hes-\nat h \u2208 Sn\u22121): (cid:98)\u2207L(h) = Ph\u22a5\u2207L(h), and (cid:98)HL(h) = Ph\u22a5 HL(h)Ph\u22a5 \u2212 (cid:104)\u2207L(h), h(cid:105)Ph\u22a5, where\nsian of L(h) (gradient and Hessian along the tangent space\nPh\u22a5 = I \u2212 hh(cid:62) is the projection onto the tangent space at h. We refer the readers to [54] for a more\ncomprehensive discussion of these concepts.\nThe toy example in Figure 3 demonstrates the geometric structure of the objective function on Sn\u22121.\n(As shown later, the quantity EL(cid:48)(cid:48)(h) is, up to an unimportant rotation of the coordinate system, a\ngood approximation to L(h).) The local minima correspond to signed shifted versions of the ground\ntruth (Figure 3(a)). The Riemannian gradient is zero at stationary points, including local minima,\nsaddle points, and local maxima of the objective function when restricted to the sphere Sn\u22121. (Figure\n3(b)). The Riemannian Hessian is positive de\ufb01nite in the neighborhoods of local minima, and has at\nleast one strictly negative eigenvalue in the neighborhoods of local maxima and saddle points (Figure\n3(c)). We say that a stationary point is a \u201cstrict saddle point\u201d if the Riemannian Hessian has at least\none strictly negative eigenvalue. Our main result Theorem 3.1 formalizes the observation that L(h)\nonly has two types of stationary points: (1) local minima, which are close to signed shifted versions\n\nFigure 2: A demonstration of the tan-\ngent space of Sn\u22121 at h, the origin of\nwhich is translated to h. The Rieman-\nnian gradient and Riemannian Hes-\nsian are de\ufb01ned on tangent spaces.\n\n4\n\n\fof the ground truth, and (2) strict saddle points. Please refer to the supplementary result for the full\nproof.\n\n(a)\n\nFigure 3: Geometric structure of the objective function over the sphere. For n = 3, we plot the follow-\n\ning quantities on the sphere S2: (a) EL(cid:48)(cid:48)(h), (b) (cid:107)E(cid:98)\u2207L(cid:48)(cid:48) (h)(cid:107), and (c) minz\u22a5h,(cid:107)z(cid:107)=1 z(cid:62)E(cid:98)HL(cid:48)(cid:48)(h)z.\n\n(b)\n\n(c)\n\n\u03c14\n\n\u03c14\n\n1, c2, c(cid:48)\n\nn \u2264 \u03b8 < 1\n\nlog n}, then with probability at least 1 \u2212 n\u2212c(cid:48)\n\nTheorem 3.1. Suppose Assumptions (A1) and (A2) are satis\ufb01ed, and the Bernoulli probability\n3 . Let \u03ba be the condition number of f, and let \u03c1 < 10\u22123 be a small tol-\nsatis\ufb01es 1\nerance constant. There exist constants c1, c(cid:48)\n2 > 0 (depending only on \u03b8), such that: if\nN > max{ c1n9\n\u03c1 , c2\u03ba8n8\n2, every local\nlog n\nminimum h\u2217 in (P1) is close to a signed shifted version of the ground truth. I.e., for some j \u2208 [n]:\n\u221a\n(cid:107)Cf Rh\u2217 \u00b1 ej(cid:107) \u2264 2\u03ba\n\u03c1. Moreover, one can partition Sn\u22121 into three sets H1, H2, and H3, which,\nfor some c(n, \u03b8, \u03c1) > 0, satisfy:\n\u25e6 L(h) is strongly convex in H1, i.e., minz:(cid:107)z(cid:107)=1\nz\u22a5h\n\u25e6 L(h) has negative curvature in H2, i.e., minz:(cid:107)z(cid:107)=1\nz\u22a5h\n\nz(cid:62)(cid:98)HL(h)z \u2264 \u2212c(n, \u03b8, \u03c1) < 0.\n\nz(cid:62)(cid:98)HL(h)z \u2265 c(n, \u03b8, \u03c1) > 0.\n\n\u25e6 L(h) has a descent direction in H3, i.e., (cid:107)(cid:98)\u2207L(h)(cid:107) \u2265 c(n, \u03b8, \u03c1) > 0.\n\n1 \u2212 n\u2212c(cid:48)\n\nClearly, all the stationary points of L(h) on Sn\u22121 belong to H1 or H2. The stationary points in H1\nare local minima, and the stationary points in H2 are strict saddle points.\n\nN\n\nN\n\nyi\n\n1\nN\n\nf Cf )\u22121/2h). Since Cf (C(cid:62)\n\nCyi)\u22121/2 asymptotically converges to (C(cid:62)\n\nProof Sketch. Note that R = ( 1\n\u03b8nN\nN increases. Therefore, L(h) can be approximated by L(cid:48)(h) = 1\n\n(cid:80)N\n(cid:80)N\ni=1 C(cid:62)\ni=1 \u03c6(Cyi(C(cid:62)\n(cid:80)N\ni=1 \u03c6(Cxih(cid:48)) with h(cid:48) = Cf (C(cid:62)\n\nf Cf )\u22121/2 as\nf Cf )\u22121/2h) =\nf Cf )\u22121/2 is an orthogonal matrix, one can study\nf Cf )\u22121/2h, which is a rotated\n\nthe objective function L(cid:48)(cid:48)(h(cid:48)) = 1\nversion of L(cid:48)(h) on the sphere. Our analysis consists of three parts:\n\n(cid:80)N\ni=1 \u03c6(CxiCf (C(cid:62)\n(1) Geometric structure of EL(cid:48)(cid:48): We \ufb01rst bound minz:(cid:107)z(cid:107)=1, z\u22a5h z(cid:62)E(cid:98)HL(cid:48)(cid:48)(h)z, which is strictly\n(2) Deviation of L(cid:48)(cid:48) (or its rotated version L(cid:48)) from EL(cid:48)(cid:48): We bound (cid:107)(cid:98)\u2207L(cid:48)(cid:48)(h) \u2212 E(cid:98)\u2207L(cid:48)(cid:48) (h)(cid:107) and\n(cid:107)(cid:98)HL(cid:48)(cid:48) (h) \u2212 E(cid:98)HL(cid:48)(cid:48) (h)(cid:107) using the matrix Bernstein inequality and union bounds.\n(3) Difference between L and L(cid:48): We bound (cid:107)(cid:98)\u2207L(h) \u2212(cid:98)\u2207L(cid:48)(h)(cid:107) and (cid:107)(cid:98)HL(h) \u2212 (cid:98)HL(cid:48)(h)(cid:107) using the\nmatrix Bernstein inequality and Lipschitz continuity of (cid:98)\u2207L(h) and (cid:98)HL(h).\n\npositive near its local minima, and strictly negative near all other stationary points (the strict saddle\npoints). At the same time, at all other points on Sn\u22121 (the points further away from stationary points),\nthe Riemannian gradient of EL(cid:48)(cid:48) is bounded away from zero.\n\nTheorem 3.1 follows by combining the above results.\n\n5\n\n\f4 Optimization Method\n\n(cid:0)h(t) \u2212 \u03b3(cid:98)\u2207L(h(t))(cid:1).\n\nRecently, \ufb01rst-order methods have been shown to escape strict saddle points with random initialization\n[44, 45]. In this paper, we use the manifold gradient descent algorithm studied by Lee et al. [43].\nOne can initialize the algorithm with a random h(0), and use the following iterative update:\n\nh(t+1) = A(h(t)) := PSn\u22121\n\n(2)\nEach iteration takes a Riemannian gradient descent step in the tangent space, and does a retraction\nby normalizing the iterate (projecting onto Sn\u22121). Using the geometric structure introduced in\nSection 3, and some technical results in [42, 43], the following result gives a theoretical guarantee for\nmanifold gradient descent for our formulation of MSBD: convergence to an accurate estimate (up to\nthe inherent sign and shift ambiguity) of the true solution.\nTheorem 4.1. Suppose that the geometric structure in Theorem 3.1 is satis\ufb01ed. If manifold gradient\ndescent (2) is initialized with a random h(0) drawn from a uniform distribution on Sn\u22121, and the step\n128n3 , then (2) converges to a local minimum of L(h) on Sn\u22121 almost surely.\nsize is chosen as \u03b3 = 1\nIt particular, after at most T = 4096n8\n\n\u03b82(1\u22123\u03b8)2\u03c14 iterations, h(T ) \u2208 H1. Moreover, for some j \u2208 [n]\n\n\u221a\n(cid:107)Cf Rh(T ) \u00b1 ej(cid:107) \u2264 2\u03ba\n\n\u03c1.\n\nCorollary 4.2. If the conditions of Theorem 4.1 are satis\ufb01ed, then the recovered \u02c6f and \u02c6xi in (1),\ncomputed using the output of manifold gradient descent \u02c6h = h(T ), satisfy (for some j \u2208 [n]):\n\n(cid:107)\u02c6xi \u00b1 Sj(xi)(cid:107)\n\n(cid:107)xi(cid:107)\n\n\u221a\n\u2264 2\u03ba\n\n\u03c1n,\n\n(cid:107) \u02c6f \u00b1 S\u2212j(f )(cid:107)\n\n(cid:107)f(cid:107)\n\n\u221a\n\u2264 2\u03ba\n\u03c1n\n\u221a\n1 \u2212 2\u03ba\n\n.\n\n\u03c1n\n\nTheorem 4.1 and Corollary 4.2 show that, with a random initialization and a \ufb01xed step size, manifold\ngradient descent outputs, in polynomial time, a solution that is close to a signed and shifted version\nof the ground truth. We prove these results in the supplementary material.\n\n5 Numerical Experiments\n\n5.1 Deconvolution with Synthetic Data\n\n(cid:12)(cid:12)cos \u2220(cid:0)Cf Rh(T ), ej\n\n(cid:1)(cid:12)(cid:12) > 0.95. We compute the success rate based on 100 Monte Carlo\n\nIn this section, we examine the empirical performance of manifold gradient descent (2) in solving\nMSBD (P1). We synthesize {xi}N\ni=1 following the Bernoulli-Rademacher model, and synthesize f\nfollowing a Gaussian distribution N (0n\u00d71, In). In all experiments, we run manifold gradient descent\nfor T = 100 iterations, with a \ufb01xed step size of \u03b3 = 0.1.\nRecall that the desired h is a signed shifted version of the ground truth, i.e., Cf Rh = \u00b1ej\n(j \u2208 [n]). Therefore, to evaluate the accuracy of the output h(T ), we compute Cf Rh(T ) with\nthe true f, and declare successful recovery if (cid:107)Cf Rh(T )(cid:107)\u221e/(cid:107)Cf Rh(T )(cid:107) > 0.95, or equivalently,\nif maxj\u2208[n]\ninstances. In a typical successful instance, h(t) converges to an accurate estimate of the ground truth\nafter about 50 iterations (as shown by the error and accuracy plots in Figure 4(d) and 4(h)).\nIn the \ufb01rst experiment, we \ufb01x \u03b8 = 0.1 (sparsity level, mean of the Bernoulli distribution), and\nrun experiments with n = 32, 64, . . . , 256 and N = 32, 64, . . . , 256 (see Figure 4(a)).\nIn the\nsecond experiment, we \ufb01x n = 256, and run experiments with \u03b8 = 0.02, 0.04, . . . , 0.16 and N =\n32, 64, . . . , 256 (see Figure 4(b)). The empirical phase transitions suggest that, for sparsity level\nrelatively small (e.g., \u03b8 < 0.16), there exist a constant c > 0 such that manifold gradient descent can\nrecover a signed shifted version of the ground truth with N \u2265 cn\u03b8.\nIn the third experiment, we examine the phase transition with respect to N and the condition number\n\u03ba of f, which is the ratio of the largest and smallest magnitudes of its DFT. To synthesize f with\nspeci\ufb01c \u03ba, we generate the DFT \u02dcf of f that is random with the following distribution: (1) The DFT\n\u02dcf is symmetric, i.e., \u02dcf(j) = \u02dcf(n+2\u2212j), so that f is real. (2) The phase of \u02dcf(j) follows a uniform\ndistribution on [0, 2\u03c0), except for the phases of \u02dcf(1) and \u02dcf(n/2+1) (if n is even), which are always\n0 for symmetry. (3) The gains of \u02dcf follows a uniform distribution on [1, \u03ba]. We \ufb01x n = 256 and\n\n6\n\n\f\u03b8 = 0.1, and run experiments with \u03ba = 1, 2, 4, . . . , 128 and N = 32, 64, . . . , 256 (see Figure 4(c)).\nThe phase transition suggests that the number N for successful empirical recovery is not sensitive to\nthe condition number \u03ba.\nManifold gradient descent is robust against noise. We repeat the above experiments with noisy\nmeasurements: yi = xi (cid:126) f + \u03c3\u03b5i, where \u03b5i follows a Gaussian distribution N (0n\u00d71, In). The\nn\u03b8 (SNR \u2248 20 dB) are shown in Figure 4(e), 4(f), and 4(g). For a\nphase transitions for \u03c3 = 0.1\nreasonable noise level, the number N of noisy measurements we need to accurately recover a signed\nshifted version of the ground truth is roughly the same as with noiseless measurements.\n\n\u221a\n\n256\n\n192\n\n128\n\n64\n\n256\n\n192\n\n128\n\n64\n\nN\n\nN\n\n64\n\n128 192 256\n\nn\n\n(a)\n\n256\n\n192\n\n128\n\n64\n\nN\n\n256\n\n192\n\n128\n\n64\n\nN\n\n0.04 0.08 0.12 0.16\n\n\u03b8\n\n(b)\n\n256\n\n192\n\n128\n\n64\n\nN\n\n256\n\n192\n\n128\n\n64\n\nN\n\n(cid:107)\nj\ne\n\u2212\n\n)\nt\n(\nh\nR\nf\nC\n(cid:107)\n\n2\n\n8\n\n\u03ba\n\n32\n\n128\n\n(c)\n\n(cid:107)\n\n\u221e\n(cid:107)\n\n)\nt\n(\nh\nR\nf\nC\n(cid:107)\n\n)\nt\n(\nh\nR\nf\nC\n(cid:107)\n\n64\n\n128 192 256\n\nn\n\n(e)\n\n0.04 0.08 0.12 0.16\n\n\u03b8\n\n(f)\n\n2\n\n8\n\n\u03ba\n\n32\n\n128\n\n(g)\n\n1\n\n0.5\n\n0\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n0\n\n50\nt\n\n100\n\n(d)\n\n50\nt\n\n100\n\n(h)\n\nFigure 4: Empirical phase transition (grayscale values represent success rates). The \ufb01rst column\nshows the phase transitions of N versus n. The second column shows the phase transitions of N\nversus \u03b8. The third column shows the phase transitions of N versus \u03ba. (a) - (c) are the results for the\nnoiseless case. (e) - (g) are the results for SNR \u2248 20 dB. (d) and (h) show the error (cid:107)Cf Rh(t) \u2212 ej(cid:107)\nand the accuracy (cid:107)Cf Rh(t)(cid:107)\u221e/(cid:107)Cf Rh(t)(cid:107) as functions of the iteration number t, respectively.\n\n5.2 Blind Gain and Phase Calibration\n\nIn this section, we consider the blind calibration problem [31]. Suppose that a sensing system takes\nFourier measurements of unknown signals, with sensors that have unknown gains and phases, i.e.,\n\u02dcyi = diag( \u02dcf )Fxi, where xi are the targeted unknown sparse signals, F is the DFT matrix, and the\nentries of \u02dcf represent the unknown gains and phases. In sensor array processing [55], the supports\nof xi\u2019s are identical, and represent the directions of arrival of incoming sources. The simultaneous\nrecovery of \u02dcf and xi\u2019s is equivalent to MSBD in the frequency domain.\nClearly, Assumption (A1) is not satis\ufb01ed in this case. For complex f, xi \u2208 Cn, we solve:\n\nN(cid:88)\nCyi)\u22121/2 \u2208 Cn\u00d7n, and (\u00b7)H represents the Hermitian transpose. If one\nwhere R := ( 1\ntreats the real and imaginary parts of h separately, then this optimization in Cn can be recast into\n\u03b8nN\nwi(h)(cid:1), where wi(h) represents wi(h) =\nR2n, and the gradient with respect to Re(h) and Im(h) can be used in \ufb01rst-order methods. This\nis related to Wirtinger gradient descent algorithms (see the discussion in [56]). The Riemannian\n\u221a\u22121\u2207\u03c6(Im(Cyi Rh)), and P(R\u00b7h)\u22a5 represents the projection onto the tangent\n\u2207\u03c6(Re(CyiRh)) +\nspace at h in S2n\u22121 \u2282 R2n: P(R\u00b7h)\u22a5 z = z \u2212 Re(hHz) \u00b7 h. In the complex case, one can initialize\nthe manifold gradient descent algorithm with a random h(0), for which [Re(h(0))(cid:62), Im(h(0))(cid:62)](cid:62)\nfollows a uniform distribution on S2n\u22121.\n\ngradient with respect to h is P(R\u00b7h)\u22a5(cid:0) 1\n\n\u03c6(Re(CyiRh)) + \u03c6(Im(Cyi Rh)),\n\ns.t. (cid:107)h(cid:107) = 1,\n\nmin\nh\u2208Cn\n\n(cid:80)N\n\ni=1 RHC H\nyi\n\nN\n\n(cid:80)N\n\n1\nN\n\ni=1\n\ni=1 C H\nyi\n\n7\n\n\f128\n\n96\n\n64\n\n32\n\nN\n\n128\n\n96\n\n64\n\n32\n\nN\n\n128\n\n96\n\n64\n\n32\n\nN\n\n128\n\n96\n\n64\n\n32\n\nN\n\n4\n\n8\n\ns\n\n12\n\n16\n\n(a)\n\n12\n\n16\n\n4\n\n8\n\ns\n\n(b)\n\n4\n\n8\n\ns\n\n12\n\n16\n\n(c)\n\n12\n\n16\n\n4\n\n8\n\ns\n\n(d)\n\nFigure 5: Empirical phase transition of N versus s, given that n = 128. (a) Manifold gradient\ndescent. (b) Truncated power iteration [31]. (c) Off-the-grid algebraic method [57]. (d) Off-the-grid\noptimization approach [58].\n\nWe compare manifold gradient descent (with random initialization) with three blind calibration\nalgorithms that solve MSBD in the frequency domain: (i) truncated power iteration [31] (initialized\nwith f (0) = e1 and x(0)\ni = 0); (ii) an off-the-grid algebraic method [57] (simpli\ufb01ed from [55]); and\n(iii) an off-the-grid optimization approach [58].\nWe consider Gaussian random \u02dcf \u223c CN (0n\u00d71, In), and jointly s-sparse {xi}N\ni=1, for which the\nsupport is chosen uniformly at random, and the nonzero entries of {xi}N\ni=1 follow a complex\nGaussian distribution CN (0, 1). We \ufb01x n = 128, and run experiments for N = 16, 32, 48,\u00b7\u00b7\u00b7 , 128,\nand s = 2, 4, 6, . . . , 16. We say that the recovery is successful is the accuracy (cosine of the angle\nbetween the true signal and the recovered signal) is greater than 0.7.\nBy the phase transitions in Figure 5, manifold gradient descent and truncated power iteration are\nboth successful when N \u2265 48 and s \u2264 8. However, although truncated power iteration achieves\nhigher success rates when both N and s are small, it fails for s > 8 even with a large N. In contrast,\nmanifold gradient descent can recover channels with s = 16 when N \u2265 80.\nThe off-the-grid methods are designed, hence provide better recovery than the \ufb01rst two algorithms,\nfor the case that the unknown sparse signals do not reside on a discrete grid (i.e., \u201coff the grid\u201d).\nHowever, the off-the-grid methods rely on the properties of the covariance matrix 1\ni , and\nN\nrequire a much larger N than the \ufb01rst two algorithms to achieve high success rates when the sparse\nsignals actually lie on a regular grid (see the phase transitions in Figure 5).\n\n(cid:80)N\n\ni=1 yiyH\n\n5.3 Super-Resolution Fluorescence Microscopy\n\nManifold gradient descent can be applied to deconvolution of time resolved \ufb02uorescence microscopy\nimages. The goal is to recover sharp images xi\u2019s from observations yi\u2019s that are blurred by an\nunknown PSF f.\nWe use a publicly available microtubule dataset [28], which contains N = 626 images (Figure\n6(a)). Since \ufb02uorophores are are turned on and off stochastically, the images xi\u2019s are random sparse\nsamples of the 64 \u00d7 64 microtubule image (Figure 6(e)). The observations yi\u2019s (Figure 6(b), 6(f)) are\nsynthesized by circular convolutions with the PSF in Figure 6(i). The recovered images (Figure 6(c),\n6(g)) and kernel (Figure 6(j)) clearly demonstrate the effectiveness of our approach in this setting.\nBlind deconvolution is less sensitive to instrument calibration error than non-blind deconvolution.\nIf the PSF used in a non-blind deconvolution method fails to account for certain optic aberration,\nthe resulting images may suffer from spurious artifacts. For example, if we use a miscalibrated PSF\n(Figure 6(k)) in non-blind image reconstruction using FISTA [59], then the recovered images (Figure\n6(d), 6(h)) suffer from serious spurious artifacts.\n\n6 Conclusion\n\nIn this paper, we study the geometric structure of multichannel sparse blind deconvolution over\nthe unit sphere. Our theoretical analysis reveals that local minima of a sparsity promoting smooth\nobjective function correspond to signed shifted version of the ground truth, and saddle points have\nstrictly negative curvatures. Thanks to the favorable geometric properties of the objective, we can\n\n8\n\n\f(a)\n\n(b)\n\n(c)\n\n(d)\n\n(e)\n\n(f)\n\n(g)\n\n(h)\n\n(i)\n\n(j)\n\n(k)\n\nFigure 6: Super-resolution \ufb02uorescence microscopy experiment using manifold gradient descent. (a)\nTrue images. (b) Observed images. (c) Recovered images using blind deconvolution. (d) Recovered\nimages using non-blind deconvolution and a miscalibrated PSF. (e)(f)(g)(h) are average images of\n(a)(b)(c)(d). (i) True PSF. (j) Recovered PSF using blind deconvolution. (k) Miscalibrated PSF used\nin non-blind deconvolution. All images in this \ufb01gure are of the same size (64 \u00d7 64).\n\nsimultaneously recover the unknown signal and unknown channels from convolutional measurements\nusing manifold gradient descent with a random initialization.\nIn practice, many convolutional\nmeasurement models are subsampled in the spatial domain (e.g., image super-resolution) or in the\nfrequency domain (e.g., radio astronomy). Studying the effect of subsampling on the geometric\nstructure of multichannel sparse blind deconvolution is an interesting problem for future work.\n\nAcknowledgments\n\nThis work was supported in part by the National Science Foundation (NSF) under Grant IIS 14-47879.\nThe authors would like to thank Ju Sun for helpful discussions about this paper. The manuscript\nbene\ufb01ted from constructive comments by the anonymous reviewers.\n\nReferences\n\n[1] S. Cho and S. Lee, \u201cFast motion deblurring,\u201d in ACM Transactions on Graphics (TOG), vol. 28,\n\nno. 5. ACM, 2009, p. 145.\n\n[2] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, \u201cUnderstanding blind deconvolution\nalgorithms,\u201d IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 12,\npp. 2354\u20132367, Dec 2011.\n\n[3] L. Xu, S. Zheng, and J. Jia, \u201cUnnatural l0 sparse representation for natural image deblurring,\u201d\nIEEE, 2013,\n\nin Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on.\npp. 1107\u20131114.\n\n[4] A. Ahmed, B. Recht, and J. Romberg, \u201cBlind deconvolution using convex programming,\u201d IEEE\n\nTransactions on Information Theory, vol. 60, no. 3, pp. 1711\u20131732, March 2014.\n\n[5] S. Ling and T. Strohmer, \u201cSelf-calibration and biconvex compressive sensing,\u201d Inverse Problems,\n\nvol. 31, no. 11, p. 115002, 2015.\n\n9\n\n\f[6] Y. Chi, \u201cGuaranteed blind sparse spikes deconvolution via lifting and convex optimization,\u201d\nIEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 4, pp. 782\u2013794, June 2016.\n[7] X. Li, S. Ling, T. Strohmer, and K. Wei, \u201cRapid, robust, and reliable blind deconvolution via\n\nnonconvex optimization,\u201d arXiv preprint arXiv:1606.04933, 2016.\n\n[8] K. Lee, Y. Li, M. Junge, and Y. Bresler, \u201cBlind recovery of sparse signals from subsampled\nconvolution,\u201d IEEE Transactions on Information Theory, vol. 63, no. 2, pp. 802\u2013821, Feb 2017.\n[9] W. Huang and P. Hand, \u201cBlind deconvolution by a steepest descent algorithm on a quotient\n\nmanifold,\u201d arXiv preprint arXiv:1710.03309, 2017.\n\n[10] Y. Zhang, Y. Lau, H.-w. Kuo, S. Cheung, A. Pasupathy, and J. Wright, \u201cOn the global geometry\nof sphere-constrained sparse blind deconvolution,\u201d in Proceedings of the IEEE Conference on\nComputer Vision and Pattern Recognition, 2017, pp. 4894\u20134902.\n\n[11] Y. Li, K. Lee, and Y. Bresler, \u201cIdenti\ufb01ability in blind deconvolution with subspace or sparsity\nconstraints,\u201d IEEE Transactions on Information Theory, vol. 62, no. 7, pp. 4266\u20134275, July\n2016.\n\n[12] \u2014\u2014, \u201cIdenti\ufb01ability and stability in blind deconvolution under minimal assumptions,\u201d IEEE\n\nTransactions on Information Theory, vol. 63, no. 7, pp. 4619\u20134633, July 2017.\n\n[13] L. Tong and S. Perreau, \u201cMultichannel blind identi\ufb01cation: from subspace to maximum likeli-\n\nhood methods,\u201d Proceedings of the IEEE, vol. 86, no. 10, pp. 1951\u20131968, Oct 1998.\n\n[14] H. She, R.-R. Chen, D. Liang, Y. Chang, and L. Ying, \u201cImage reconstruction from phased-array\ndata based on multichannel blind deconvolution,\u201d Magnetic resonance imaging, vol. 33, no. 9,\npp. 1106\u20131113, 2015.\n\n[15] H. Zhang, D. Wipf, and Y. Zhang, \u201cMulti-image blind deblurring using a coupled adaptive\nsparse prior,\u201d in Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on.\nIEEE, 2013, pp. 1051\u20131058.\n\n[16] L. Tong, G. Xu, and T. Kailath, \u201cA new approach to blind identi\ufb01cation and equalization of\nmultipath channels,\u201d in [1991] Conference Record of the Twenty-Fifth Asilomar Conference on\nSignals, Systems Computers, Nov 1991, pp. 856\u2013860 vol.2.\n\n[17] E. Moulines, P. Duhamel, J. F. Cardoso, and S. Mayrargue, \u201cSubspace methods for the blind\nidenti\ufb01cation of multichannel \ufb01r \ufb01lters,\u201d IEEE Transactions on Signal Processing, vol. 43, no. 2,\npp. 516\u2013525, Feb 1995.\n\n[18] G. Xu, H. Liu, L. Tong, and T. Kailath, \u201cA least-squares approach to blind channel identi\ufb01cation,\u201d\n\nIEEE Transactions on Signal Processing, vol. 43, no. 12, pp. 2982\u20132993, Dec 1995.\n\n[19] M. I. Gurelli and C. L. Nikias, \u201cEvam: an eigenvector-based algorithm for multichannel blind\ndeconvolution of input colored signals,\u201d IEEE Transactions on Signal Processing, vol. 43, no. 1,\npp. 134\u2013149, Jan 1995.\n\n[20] K. Lee, F. Krahmer, and J. Romberg, \u201cSpectral methods for passive imaging: Non-asymptotic\n\nperformance and robustness,\u201d arXiv preprint arXiv:1708.04343, 2017.\n\n[21] C. R. Berger, S. Zhou, J. C. Preisig, and P. Willett, \u201cSparse channel estimation for multicarrier\nunderwater acoustic communication: From subspace methods to compressed sensing,\u201d IEEE\nTransactions on Signal Processing, vol. 58, no. 3, pp. 1708\u20131721, March 2010.\n\n[22] K. G. Sabra and D. R. Dowling, \u201cBlind deconvolution in ocean waveguides using arti\ufb01cial time\nreversal,\u201d The Journal of the Acoustical Society of America, vol. 116, no. 1, pp. 262\u2013271, 2004.\n[23] N. Tian, S.-H. Byun, K. Sabra, and J. Romberg, \u201cMultichannel myopic deconvolution in\nunderwater acoustic channels via low-rank recovery,\u201d The Journal of the Acoustical Society of\nAmerica, vol. 141, no. 5, pp. 3337\u20133348, 2017.\n\n[24] K. F. Kaaresen and T. Taxt, \u201cMultichannel blind deconvolution of seismic signals,\u201d Geophysics,\n\nvol. 63, no. 6, pp. 2093\u20132107, 1998.\n\n[25] D. R. Gitelman, W. D. Penny, J. Ashburner, and K. J. Friston, \u201cModeling regional and psy-\nchophysiologic interactions in fmri: the importance of hemodynamic deconvolution,\u201d Neuroim-\nage, vol. 19, no. 1, pp. 200\u2013207, 2003.\n\n[26] M. J. Rust, M. Bates, and X. Zhuang, \u201cSub-diffraction-limit imaging by stochastic optical\n\nreconstruction microscopy (storm),\u201d Nature methods, vol. 3, no. 10, p. 793, 2006.\n\n10\n\n\f[27] E. Betzig, G. H. Patterson, R. Sougrat, O. W. Lindwasser, S. Olenych, J. S. Bonifacino, M. W.\nDavidson, J. Lippincott-Schwartz, and H. F. Hess, \u201cImaging intracellular \ufb02uorescent proteins at\nnanometer resolution,\u201d Science, vol. 313, no. 5793, pp. 1642\u20131645, 2006.\n\n[28] E. A. Mukamel, H. Babcock, and X. Zhuang, \u201cStatistical deconvolution for superresolution\n\n\ufb02uorescence microscopy,\u201d Biophysical journal, vol. 102, no. 10, pp. 2391\u20132400, 2012.\n\n[29] P. Sarder and A. Nehorai, \u201cDeconvolution methods for 3-d \ufb02uorescence microscopy images,\u201d\n\nIEEE Signal Processing Magazine, vol. 23, no. 3, pp. 32\u201345, May 2006.\n\n[30] L. Wang and Y. Chi, \u201cBlind deconvolution from multiple sparse inputs,\u201d IEEE Signal Processing\n\nLetters, vol. 23, no. 10, pp. 1384\u20131388, Oct 2016.\n\n[31] Y. Li, K. Lee, and Y. Bresler, \u201cBlind gain and phase calibration via sparse spectral methods,\u201d\n\nIEEE Transactions on Information Theory, 2018.\n\n[32] T. Strohmer, \u201cFour short stories about toeplitz matrix calculations,\u201d Linear Algebra and its\n\nApplications, vol. 343, pp. 321\u2013344, 2002.\n\n[33] Y. Li, K. Lee, and Y. Bresler, \u201cIdenti\ufb01ability in bilinear inverse problems with applications\nto subspace or sparsity-constrained blind gain and phase calibration,\u201d IEEE Transactions on\nInformation Theory, vol. 63, no. 2, pp. 822\u2013842, Feb 2017.\n\n[34] \u2014\u2014, \u201cOptimal sample complexity for blind gain and phase calibration,\u201d IEEE Transactions on\n\nSignal Processing, vol. 64, no. 21, pp. 5549\u20135556, Nov 2016.\n\n[35] L. Balzano and R. Nowak, \u201cBlind calibration of sensor networks,\u201d in Proceedings of the 6th\ninternational conference on Information processing in sensor networks. ACM, 2007, pp.\n79\u201388.\n\n[36] C. Bilen, G. Puy, R. Gribonval, and L. Daudet, \u201cConvex optimization approaches for blind\nsensor calibration using sparsity,\u201d IEEE Transactions on Signal Processing, vol. 62, no. 18, pp.\n4847\u20134856, Sept 2014.\n\n[37] S. Ling and T. Strohmer, \u201cSelf-calibration via linear least squares,\u201d arXiv preprint, 2016.\n[38] J. Sun, Q. Qu, and J. Wright, \u201cComplete dictionary recovery over the sphere i: Overview and\nthe geometric picture,\u201d IEEE Transactions on Information Theory, vol. 63, no. 2, pp. 853\u2013884,\nFeb 2017.\n\n[39] \u2014\u2014, \u201cA geometric analysis of phase retrieval,\u201d Foundations of Computational Mathematics,\n\nAug 2017. [Online]. Available: https://doi.org/10.1007/s10208-017-9365-9\n\n[40] S. Mei, Y. Bai, and A. Montanari, \u201cThe landscape of empirical risk for non-convex losses,\u201d\n\narXiv preprint arXiv:1607.06534, 2016.\n\n[41] J. Sun, Q. Qu, and J. Wright, \u201cComplete dictionary recovery over the sphere ii: Recovery by\nriemannian trust-region method,\u201d IEEE Transactions on Information Theory, vol. 63, no. 2, pp.\n885\u2013914, Feb 2017.\n\n[42] N. Boumal, P.-A. Absil, and C. Cartis, \u201cGlobal rates of convergence for nonconvex optimization\n\non manifolds,\u201d arXiv preprint arXiv:1605.08101, 2016.\n\n[43] J. D. Lee, I. Panageas, G. Piliouras, M. Simchowitz, M. I. Jordan, and B. Recht, \u201cFirst-order\n\nmethods almost always avoid saddle points,\u201d arXiv preprint arXiv:1710.07406, 2017.\n\n[44] J. D. Lee, M. Simchowitz, M. I. Jordan, and B. Recht, \u201cGradient descent only converges to\n\nminimizers,\u201d in Conference on Learning Theory, 2016, pp. 1246\u20131257.\n\n[45] I. Panageas and G. Piliouras, \u201cGradient descent only converges to minimizers: Non-isolated\n\ncritical points and invariant regions,\u201d arXiv preprint arXiv:1605.00405, 2016.\n\n[46] R. Ge, F. Huang, C. Jin, and Y. Yuan, \u201cEscaping from saddle points\u2014online stochastic gradient\n\nfor tensor decomposition,\u201d in Conference on Learning Theory, 2015, pp. 797\u2013842.\n\n[47] C. Jin, R. Ge, P. Netrapalli, S. M. Kakade, and M. I. Jordan, \u201cHow to escape saddle points\n\nef\ufb01ciently,\u201d in International Conference on Machine Learning, 2017, pp. 1724\u20131732.\n\n[48] Z. Allen-Zhu, \u201cNatasha: Faster stochastic non-convex optimization via strongly non-convex\n\nparameter,\u201d arXiv preprint arXiv:1702.00763, 2017.\n\n[49] \u2014\u2014, \u201cNatasha 2: Faster non-convex optimization than sgd,\u201d arXiv preprint arXiv:1708.08694,\n\n2017.\n\n11\n\n\f[50] N. Agarwal, Z. Allen-Zhu, B. Bullins, E. Hazan, and T. Ma, \u201cFinding approximate local minima\nfaster than gradient descent,\u201d in Proceedings of the 49th Annual ACM SIGACT Symposium on\nTheory of Computing. ACM, 2017, pp. 1195\u20131199.\n\n[51] D. Goldfarb, C. Mu, J. Wright, and C. Zhou, \u201cUsing negative curvature in solving nonlinear\nprograms,\u201d Computational Optimization and Applications, vol. 68, no. 3, pp. 479\u2013502, Dec\n2017. [Online]. Available: https://doi.org/10.1007/s10589-017-9925-6\n\n[52] D. L. Donoho and M. Elad, \u201cOptimally sparse representation in general (nonorthogonal)\ndictionaries via (cid:96)1 minimization,\u201d Proceedings of the National Academy of Sciences, vol. 100,\nno. 5, pp. 2197\u20132202, feb 2003.\n\n[53] Y. Zhang, H.-W. Kuo, and J. Wright, \u201cStructured local optima in sparse blind deconvolution,\u201d\nin Proceedings of the 10th NIPS Workshop on Optimization for Machine Learning (OPTML),\n2017.\n\n[54] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization algorithms on matrix manifolds.\n\nPrinceton University Press, 2009.\n\n[55] A. Paulraj and T. Kailath, \u201cDirection of arrival estimation by eigenstructure methods with\nunknown sensor gain and phase,\u201d in ICASSP \u201985. IEEE International Conference on Acoustics,\nSpeech, and Signal Processing, vol. 10, Apr 1985, pp. 640\u2013643.\n\n[56] E. J. Cand\u00e8s, X. Li, and M. Soltanolkotabi, \u201cPhase retrieval via wirtinger \ufb02ow: Theory and\nalgorithms,\u201d IEEE Transactions on Information Theory, vol. 61, no. 4, pp. 1985\u20132007, April\n2015.\n\n[57] M. P. Wylie, S. Roy, and R. F. Schmitt, \u201cSelf-calibration of linear equi-spaced (les) arrays,\u201d in\n1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, April\n1993, pp. 281\u2013284 vol.1.\n\n[58] Y. C. Eldar, W. Liao, and S. Tang, \u201cSensor calibration for off-the-grid spectral estimation,\u201d arXiv\n\npreprint arXiv:1707.03378, 2017.\n\n[59] A. Beck and M. Teboulle, \u201cA fast iterative shrinkage-thresholding algorithm for linear inverse\n\nproblems,\u201d SIAM journal on imaging sciences, vol. 2, no. 1, pp. 183\u2013202, 2009.\n\n12\n\n\f", "award": [], "sourceid": 602, "authors": [{"given_name": "Yanjun", "family_name": "Li", "institution": "UIUC"}, {"given_name": "Yoram", "family_name": "Bresler", "institution": "University of Illinois"}]}