{"title": "Phase Retrieval using Alternating Minimization", "book": "Advances in Neural Information Processing Systems", "page_first": 2796, "page_last": 2804, "abstract": "Phase retrieval problems involve solving linear equations, but with missing sign (or phase, for complex numbers) information. Over the last two decades, a popular generic empirical approach to the many variants of this problem has been one of alternating minimization; i.e. alternating between estimating the missing phase information, and the candidate solution. In this paper, we show that a simple alternating minimization algorithm geometrically converges to the solution of one such problem -- finding a vector $x$ from $y,A$, where $y = |A'x|$ and $|z|$ denotes a vector of element-wise magnitudes of $z$ -- under the assumption that $A$ is Gaussian.  Empirically, our algorithm performs similar to recently proposed convex techniques for this variant (which are based on lifting\" to a convex matrix problem) in sample complexity and robustness to noise. However, our algorithm is much more efficient and can scale to large problems. Analytically, we show geometric convergence to the solution, and sample complexity that is off by log factors from obvious lower bounds. We also establish close to optimal scaling for the case when the unknown vector is sparse. Our work represents the only known proof of alternating minimization for any variant of phase retrieval problems in the non-convex setting.\"", "full_text": "Phase Retrieval using Alternating Minimization\n\nPraneeth Netrapalli\nDepartment of ECE\n\nThe University of Texas at Austin\n\nAustin, TX 78712\n\npraneethn@utexas.edu\n\nPrateek Jain\n\nMicrosoft Research India\n\nBangalore, India\n\nprajain@microsoft.com\n\nSujay Sanghavi\n\nDepartment of ECE\n\nThe University of Texas at Austin\n\nAustin, TX 78712\n\nsanghavi@mail.utexas.edu\n\nAbstract\n\nPhase retrieval problems involve solving linear equations, but with missing sign\n(or phase, for complex numbers). Over the last two decades, a popular generic em-\npirical approach to the many variants of this problem has been one of alternating\nminimization; i.e. alternating between estimating the missing phase information,\nand the candidate solution. In this paper, we show that a simple alternating min-\nimization algorithm geometrically converges to the solution of one such problem\n\n\u2013 \ufb01nding a vector x from y, A, where y = |AT x| and |z| denotes a vector of\n\nelement-wise magnitudes of z \u2013 under the assumption that A is Gaussian.\nEmpirically, our algorithm performs similar to recently proposed convex tech-\nniques for this variant (which are based on \u201clifting\u201d to a convex matrix problem)\nin sample complexity and robustness to noise. However, our algorithm is much\nmore ef\ufb01cient and can scale to large problems. Analytically, we show geometric\nconvergence to the solution, and sample complexity that is off by log factors from\nobvious lower bounds. We also establish close to optimal scaling for the case\nwhen the unknown vector is sparse. Our work represents the only known the-\noretical guarantee for alternating minimization for any variant of phase retrieval\nproblems in the non-convex setting.\n\n1\n\nIntroduction\n\nIn this paper we are interested in recovering a complex1 vector x\u2217 \u2208 Cn from magnitudes of its\nlinear measurements. That is, for ai \u2208 Cn, if\n\n(1)\n\nthen the task is to recover x\u2217 using y and the measurement matrix A = [a1 a2 . . . am].\n\nyi = |hai, x\u2217i|,\n\nfor i = 1, . . . , m\n\nThe above problem arises in many settings where it is harder / infeasible to record the phase of mea-\nsurements, while recording the magnitudes is signi\ufb01cantly easier. This problem, known as phase\nretrieval, is encountered in several applications in crystallography, optics, spectroscopy and tomog-\nraphy [14]. Moreover, the problem is broadly studied in the following two settings:\n\n(i) The measurements in (1) correspond to the Fourier transform (the number of measurements\n\nhere is equal to n) and there is some apriori information about the signal.\n\n1Our results also cover the real case, i.e. where all quantities are real.\n\n1\n\n\f(ii) The set of measurements y are overcomplete (i.e., m > n), while some apriori information\n\nabout the signal may or may not be available.\n\nIn the \ufb01rst case, various types of apriori information about the underlying signal such as positivity,\nmagnitude information on the signal [11], sparsity [25] and so on have been studied. In the second\ncase, algorithms for various measurement schemes such as Fourier oversampling [21], multiple\nrandom illuminations [4, 28] and wavelet transform [28] have been suggested.\n\nBy and large, the most well known methods for solving this problem are the error reduction algo-\nrithms due to Gerchberg and Saxton [13] and Fienup [11], and variants thereof. These algorithms\nare alternating projection algorithms that iterate between the unknown phases of the measurements\nand the unknown underlying vector. Though the empirical performance of these algorithms has been\nwell studied [11, 19], and they are used in many applications [20], there are not many theoretical\nguarantees regarding their performance.\n\nMore recently, a line of work [7, 6, 28] has approached this problem from a different angle, based\non the realization that recovering x\u2217 is equivalent to recovering the rank-one matrix x\u2217x\u2217T , i.e., its\nouter product. Inspired by the recent literature on trace norm relaxation of the rank constraint, they\ndesign SDPs to solve this problem. Refer Section 1.1 for more details.\n\nIn this paper we go back to the empirically more popular ideology of alternating minimization;\nwe develop a new alternating minimization algorithm, for which we show that (a) empirically, it\nnoticeably outperforms convex methods, and (b) analytically, a natural resampled version of this\nalgorithm requires O(n log3 n) i.i.d. random Gaussian measurements to geometrically converge to\nthe true vector.\nOur contribution:\n\n\u2022 The iterative part of our algorithm is implicit in previous work [13, 11, 28, 4]; the novelty\nin our algorithmic contribution is the initialization step which makes it more likely for the\niterative procedure to succeed - see Figures 1 and 2.\n\nalternating minimization for the phase retrieval problem in a non-convex setting.\n\n\u2022 Our analytical contribution is the \ufb01rst theoretical guarantee regarding the convergence of\n\u2022 When the underlying vector is sparse, we design another algorithm that achieves a sample\ncomplexity of O(cid:16)(x\u2217min)\u22124 log n + log3 k(cid:1)(cid:17) where k is the sparsity and x\u2217min is the mini-\n\nmum non-zero entry of x\u2217. This algorithm also runs over Cn and scales much better than\nSDP based methods.\n\nBesides being an empirically better algorithm for this problem, our work is also interesting in a\nbroader sense: there are many problems in machine learning where the natural formulation of a\nproblem is non-convex; examples include rank constrained problems, applications of EM algorithms\netc., and alternating minimization has good empirical performance. However, the methods with the\nbest (or only) analytical guarantees involve convex relaxations (e.g., by relaxing the rank constraint\nand penalizing the trace norm). In most of these settings, correctness of alternating minimization is\nan open question. We believe that our results in this paper are of interest, and may have implications,\nin this larger context.\n\nThe rest of the paper is organized as follows: In section 1.1, we brie\ufb02y review related work. We\nclarify our notation in Section 2. We present our algorithm in Section 3 and the main results in\nSection 4. We present our results for the sparse case in Section 5. Finally, we present experimental\nresults in Section 6.\n\n1.1 Related Work\n\nPhase Retrieval via Non-Convex Procedures: Inspite of the huge amount of work it has attracted,\nphase retrieval has been a long standing open problem. Early work in this area focused on using\nholography to capture the phase information along with magnitude measurements [12]. However,\ncomputational methods for reconstruction of the signal using only magnitude measurements re-\nceived a lot of attention due to their applicability in resolving spurious noise, fringes, optical system\naberrations and so on and dif\ufb01culties in the implementation of interferometer setups [9]. Though\nsuch methods have been developed to solve this problem in various practical settings [8, 20], our\n\n2\n\n\ftheoretical understanding of this problem is still far from complete. Many papers have focused on\ndetermining conditions under which (1) has a unique solution - see [24] and references therein.\nHowever, the uniqueness results of these papers do not resolve the algorithmic question of how to\n\ufb01nd the solution to (1).\n\nSince the seminal work of Gerchberg and Saxton [13] and Fienup [11], many iterated projection\nalgorithms have been developed targeted towards various applications [1, 10, 2]. [21] \ufb01rst suggested\nthe use of multiple magnitude measurements to resolve the phase problem. This approach has been\nsuccessfully used in many practical applications - see [9] and references there in. Following the\nempirical success of these algorithms, researchers were able to explain its success in some of the\ninstances [29] using Bregman\u2019s theory of iterated projections onto convex sets [3]. However, many\ninstances, such as the one we consider in this paper, are out of reach of this theory since they involve\nmagnitude constraints which are non-convex. To the best of our knowledge, there are no theoretical\nresults on the convergence of these approaches in a non-convex setting.\n\nPhase Retrieval via Convex Relaxation: An interesting recent approach for solving this problem\nformulates it as one of \ufb01nding the rank-one solution to a system of linear matrix equations. The\npapers [7, 6] then take the approach of relaxing the rank constraint by a trace norm penalty, making\nthe overall algorithm a convex program (called PhaseLift) over n \u00d7 n matrices. Another recent line\nof work [28] takes a similar but different approach : it uses an SDP relaxation (called PhaseCut) that\nis inspired by the classical SDP relaxation for the max-cut problem. To date, these convex methods\nare the only ones with analytical guarantees on statistical performance [5, 28] (i.e. the number m of\nmeasurements required to recover x\u2217) \u2013 under an i.i.d. random Gaussian model on the measurement\nvectors ai. However, by \u201clifting\u201d a vector problem to a matrix one, these methods lead to a much\nlarger representation of the state space, and higher computational cost as a result.\n\nSparse Phase Retrieval: A special case of the phase retrieval problem which has received a lot\nof attention recently is when the underlying signal x\u2217 is known to be sparse. Though this problem\nis closely related to the compressed sensing problem, lack of phase information makes this harder.\nHowever, the \u21131 regularization approach of compressed sensing has been successfully used in this\nsetting as well. In particular, if x\u2217 is sparse, then the corresponding lifted matrix x\u2217x\u2217T is also\nsparse. [22, 18] use this observation to design \u21131 regularized SDP algorithms for phase retrieval\nof sparse vectors. For random Gaussian measurements, [18] shows that \u21131 regularized PhaseLift\nrecovers x\u2217 correctly if the number of measurements is \u2126(k2 log n). By the results of [23], this\nresult is tight up to logarithmic factors for \u21131 and trace norm regularized SDP relaxations.\n\nAlternating Minimization (a.k.a. ALS): Alternating minimization has been successfully applied\nto many applications in the low-rank matrix setting. For example, clustering, sparse PCA, non-\nnegative matrix factorization, signed network prediction etc.\n- see [15] and references there in.\nHowever, despite empirical success, for most of the problems, there are no theoretical guarantees\nregarding its convergence except to a local minimum. The only exceptions are the results in [16, 15]\nwhich give provable guarantees for alternating minimization for the problems of matrix sensing and\nmatrix completion.\n\n2 Notation\n\nWe use bold capital letters (A, B etc.) for matrices, bold small case letters (x, y etc.) for vectors\nand non-bold letters (\u03b1, U etc.) for scalars. For every complex vector w \u2208 Cn, |w| \u2208 Rn denotes\nits element-wise magnitude vector. wT and AT denote the Hermitian transpose of the vector w\nand the matrix A respectively. e1, e2, etc. denote the canonical basis vectors in Cn. z denotes the\ncomplex conjugate of the complex number z. In this paper we use the standard Gaussian (or normal)\ndistribution over Cn. a is said to be distributed according to this distribution if a = a1 + ia2, where\na1 and a2 are independent and are distributed according to N (0, I). We also de\ufb01ne Ph (z) def= z\n|z|\nfor every w1, w2 \u2208 Cn. Finally, we\n\nfor every z \u2208 C, and dist (w1, w2) def= r1 \u2212(cid:12)(cid:12)(cid:12)\n\nuse the shorthand wlog for without loss of generality and whp for with high probability.\n\n2\n\nhw1,w2i\n\nkw1k2kw2k2(cid:12)(cid:12)(cid:12)\n\n3 Algorithm\n\nIn this section, we present our alternating minimization based algorithm for solving the phase re-\n\ntrieval problem. Let A \u2208 Cn\u00d7m be the measurement matrix, with ai as its ith column; similarly let\n\n3\n\n\fAlgorithm 1 AltMinPhase\ninput A, y, t0\n\n1: Initialize x0 \u2190 top singular vector of Pi y2\n2: for t = 0,\u00b7\u00b7\u00b7 , t0 \u2212 1 do\n3: Ct+1 \u2190 Diag Ph AT xt(cid:1)(cid:1)\nxt+1 \u2190 argminx\u2208Rn(cid:13)(cid:13)AT x \u2212 Ct+1y(cid:13)(cid:13)2\n\n4:\n5: end for\noutput xt0\n\nT\n\ni aiai\n\ny be the vector of recorded magnitudes. Then,\n\nRecall that, given y and A, the goal is to recover x\u2217. If we had access to the true phase c\u2217 of AT x\u2217\n(i.e., c\u2217i = Ph (hai, x\u2217i)) and m \u2265 n, then our problem reduces to one of solving a system of linear\n\nequations:\n\ny = | AT x\u2217 |.\n\nC\u2217y = AT x\u2217,\n\nwhere C\u2217 def= Diag(c\u2217) is the diagonal matrix of phases. Of course we do not know C\u2217, hence one\napproach to recovering x\u2217 is to solve:\n\nargmin\n\nC,x\n\nkAT x \u2212 Cyk2,\n\n(2)\n\nwhere x \u2208 Cn and C \u2208 Cm\u00d7m is a diagonal matrix with each diagonal entry of magnitude 1. Note\n\nthat the above problem is not convex since C is restricted to be a diagonal phase matrix and hence,\none cannot use standard convex optimization methods to solve it.\n\nInstead, our algorithm uses the well-known alternating minimization: alternatingly update x and C\nso as to minimize (2). Note that given C, the vector x can be obtained by solving the following least\n\nsquares problem: minx kAT x \u2212 Cyk2. Since the number of measurements m is larger than the\ndimensionality n and since each entry of A is sampled from independent Gaussians, A is invertible\nwith probability 1. Hence, the above least squares problem has a unique solution. On the other hand,\ngiven x, the optimal C is given by C = Diag Ph AT x(cid:1)(cid:1).\n\nWhile the above algorithm is simple and intuitive, it is known that with bad initial points, the solu-\ntion might not converge to x\u2217. In fact, this algorithm with a uniformly random initial point has been\nempirically evaluated for example in [28], where it performs worse than SDP based methods. More-\nover, since the underlying problem is non-convex, standard analysis techniques fail to guarantee\nconvergence to the global optimum, x\u2217. Hence, the key challenges here are: a) a good initialization\nstep for this method, b) establishing this method\u2019s convergence to x\u2217.\n\nWe address the \ufb01rst key challenge in our AltMinPhase algorithm (Algorithm 1) by initializing x as\nthe largest singular vector of the matrix S = 1\nT . Theorem 4.1 shows that when A is\nsampled from standard complex normal distribution, this initialization is accurate. In particular, if\n\ni aiai\n\nmPi y2\n\nm \u2265 C1n log3 n for large enough C1 > 0, then whp we have kx0 \u2212 x\u2217k2 \u2264 1/100 (or any other\n\nconstant).\n\nTheorem 4.2 addresses the second key challenge and shows that a variant of AltMinPhase (see\nAlgorithm 2) actually converges to the global optimum x\u2217 at linear rate. See section 4 for a detailed\nanalysis of our algorithm.\n\nWe would like to stress that not only does a natural variant of our proposed algorithm have rigorous\ntheoretical guarantees, it also is effective practically as each of its iterations is fast, has a closed form\nsolution and does not require SVD computation. AltMinPhase has similar statistical complexity to\nthat of PhaseLift and PhaseCut while being much more ef\ufb01cient computationally. In particular, for\naccuracy \u01eb, we only need to solve each least squares problem only up to accuracy O (\u01eb). Now, since\nthe measurement matrix A is sampled from Gaussian with m > Cn, it is well conditioned. Hence,\n\n\u01eb(cid:1) time. When m = O (n) and\nusing conjugate gradient method, each such step takes O mn log 1\nwe have geometric convergence, the total time taken by the algorithm is O n2 log2 1\n\u01eb(cid:1). SDP based\nmethods on the other hand require \u2126(n3/\u221a\u01eb) time. Moreover, our initialization step increases the\nlikelihood of successful recovery as opposed to a random initialization (which has been considered\nso far in prior work). Refer Figure 1 for an empirical validation of these claims.\n\n4\n\n\f(a)\n\n(b)\n\nFigure 1: Sample and Time complexity of various methods for Gaussian measurement matrices A.\nFigure 1(a) compares the number of measurements required for successful recovery by various meth-\nods. We note that our initialization improves sample complexity over that of random initialization\n(AltMin (random init)) by a factor of 2. AltMinPhase requires similar number of measurements as\nPhaseLift and PhaseCut. Figure 1(b) compares the running time of various algorithms on log-scale.\nNote that AltMinPhase is almost two orders of magnitude faster than PhaseLift and PhaseCut.\n\n4 Main Results: Analysis\n\nIn this section we describe the main contribution of this paper: provable statistical guarantees for the\nsuccess of alternating minimization in solving the phase recovery problem. To this end, we consider\nthe setting where each measurement vector ai is iid and is sampled from the standard complex\nnormal distribution. We would like to stress that all the existing guarantees for phase recovery also\nuse exactly the same setting [6, 5, 28]. Table 1 presents a comparison of the theoretical guarantees\nof Algorithm 2 as compared to PhaseLift and PhaseCut.\n\nAlgorithm 2\nPhaseLift [5]\nPhaseCut [28]\n\nSample complexity\n\nO n log3 n + log 1\n\n\u01eb log log 1\n\nO (n)\nO (n)\n\nComp. complexity\n\n\u01eb(cid:1)(cid:1) O n2 log3 n + log2 1\nO n3/\u01eb2(cid:1)\nO n3/\u221a\u01eb(cid:1)\n\n\u01eb log log 1\n\n\u01eb(cid:1)(cid:1)\n\nTable 1: Comparison of Algorithm 2 with PhaseLift and PhaseCut: Though the sample complexity\nof Algorithm 2 is off by log factors from that of PhaseLift and PhaseCut, it is O (n) better than them\nin computational complexity. Note that, we can solve the least squares problem in each iteration\napproximately by using conjugate gradient method which requires only O (mn) time.\n\nOur proof for convergence of alternating minimization can be broken into two key results. We \ufb01rst\n\nshow that if m \u2265 Cn log3 n, then whp the initialization step used by AltMinPhase returns x0 which\nis at most a constant distance away from x\u2217. Furthermore, that constant can be controlled by using\nmore samples (see Theorem 4.1).\nWe then show that if xt is a \ufb01xed vector such that dist xt, x\u2217(cid:1) < c (small enough) and A is sampled\nindependently of xt with m > Cn (C large enough) then whp xt+1 satis\ufb01es: dist xt+1, x\u2217(cid:1) <\n4 dist xt, x\u2217(cid:1) (see Theorem 4.2). Note that our analysis critically requires xt to be \u201c\ufb01xed\u201d and\n\nbe independent of the sample matrix A. Hence, we cannot re-use the same A in each iteration;\ninstead, we need to resample A in every iteration. Using these results, we prove the correctness of\nAlgorithm 2, which is a natural resampled version of AltMinPhase.\n\n3\n\nWe now present the two results mentioned above. For our proofs, wlog, we assume that kx\u2217k2 = 1.\n\nOur \ufb01rst result guarantees a good initial vector.\nTheorem 4.1. There exists a constant C1 such that if m > C1\nprobability greater than 1 \u2212 4/m2 we have:\n\nc2 n log3 n, then in Algorithm 2, with\n\nkx0 \u2212 x\u2217k2 < c.\n\n5\n\n\f\u01eb\n\nAlgorithm 2 AltMinPhase with Resampling\ninput A, y, \u01eb\n1: t0 \u2190 c log 1\n2: Partition y and (the corresponding columns of) A into t0 + 1 equal disjoint sets:\n(y0, A0), (y1, A1),\u00b7\u00b7\u00b7 , (yt0 , At0 )\n3: x0 \u2190 top singular vector of Pl y0\nl(cid:1)2\n\u2113  a0\n\u2113(cid:1)T\n4: for t = 0,\u00b7\u00b7\u00b7 , t0 \u2212 1 do\n5: Ct+1 \u2190 Diag(cid:16)Ph(cid:16) At+1(cid:1)T\nxt(cid:17)(cid:17)\nxt+1 \u2190 argminx\u2208Rn(cid:13)(cid:13)(cid:13) At+1(cid:1)T\nx \u2212 Ct+1yt+1(cid:13)(cid:13)(cid:13)2\n\na0\n\n6:\n\n7: end for\noutput xt0\n\nThe second result proves geometric decay of error assuming a good initialization.\n\nTheorem 4.2. There exist constants c, bc and ec such that in iteration t of Algorithm 2,\ndist xt, x\u2217(cid:1) < c and the number of columns of At is greater thanbcn log 1\nmore than 1 \u2212 \u03b7, we have:\n\nif\n\u03b7 then, with probability\n\ndist xt+1, x\u2217(cid:1) <\n\n3\n4\n\ndist xt, x\u2217(cid:1) , and kxt+1 \u2212 x\u2217k2 <ec dist xt, x\u2217(cid:1) .\n\nProof. For simplicity of notation in the proof of the theorem, we will use A for At+1, C for Ct+1,\nx for xt, x+ for xt+1, and y for yt+1. Now consider the update in the (t + 1)th iteration:\n\nwhere D is a diagonal matrix with Dll\n\nx+ = argmin\n\nex\u2208Rn (cid:13)(cid:13)ATex \u2212 Cy(cid:13)(cid:13)2 = AAT(cid:1)\u22121\nx+ = AAT(cid:1)\u22121\n\ndef= Ph(cid:16)a\u2113\n\nADAT x\u2217 = x\u2217 + AAT(cid:1)\u22121\n\nT x \u00b7 a\u2113\n\nADAT x\u2217,\n\nACy = AAT(cid:1)\u22121\nT x\u2217(cid:17). Now (3) can be rewritten as:\n\nA (D \u2212 I) AT x\u2217,\n\n(3)\n\n(4)\n\nthat is, x+ can be viewed as a perturbation of x\u2217 and the goal is to bound the error term (the second\nterm above). We break the proof into two main steps:\n\n1. \u2203 a constant c1 such that |hx\u2217, x+i| \u2265 1 \u2212 c1dist (x, x\u2217) (see Lemma A.2), and\n2. |hz, x+i| \u2264 5\n\n9 dist (x, x\u2217), for all z s.t. zT x\u2217 = 0. (see Lemma A.4)\n\nAssuming the above two bounds and choosing c < 1\n\n100c1\n\n, we can prove the theorem:\n\ndist x+, x\u2217(cid:1)2\n\n<\n\n(25/81) \u00b7 dist (x, x\u2217)2\n(1 \u2212 c1dist (x, x\u2217))2 \u2264\n\n9\n16\n\ndist (x, x\u2217)2 ,\n\nproving the \ufb01rst part of the theorem. The second part follows easily from (3) and Lemma A.2.\n\nIntuition and key challenge: If we look at step 6 of Algorithm 2, we see that, for the measurements,\nwe use magnitudes calculated from x\u2217 and phases calculated from x. Intuitively, this means that we\nare trying to push x+ towards x\u2217 (since we use its magnitudes) and x (since we use its phases) at\nthe same time. The key intuition behind the success of this procedure is that the push towards x\u2217 is\nstronger than the push towards x, when x is close to x\u2217. The key lemma that captures this effect is\nstated below:\nLemma 4.3. Let w1 and w2 be two independent standard complex Gaussian random variables2.\n\nLet U = |w1| w2(cid:16)Ph(cid:16)1 +\nthat if \u221a1 \u2212 \u03b12 < \u03b3, then: E [U ] \u2264 (1 + \u03b4)\u221a1 \u2212 \u03b12.\n\n\u03b1|w1| (cid:17) \u2212 1(cid:17) . Fix \u03b4 > 0. Then, there exists a constant \u03b3 > 0 such\n\u221a1\u2212\u03b12w2\n\n2z is standard complex Gaussian if z = z1 + iz2 where z1 and z2 are independent standard normal random\n\nvariables.\n\n6\n\n\fAlgorithm 3 SparseAltMinPhase\ninput A, y, k\n\n1: S \u2190 top-k argmaxj\u2208[n]Pm\n\ni=1 |aijyi| {Pick indices of k largest absolute value inner product}\n2: Apply Algorithm 2 on AS, yS and output the resulting vector with elements in Sc set to zero.\n\nAlgorithm 3\n\n\u01eb log log 1\n\nSample complexity\n\nComp. complexity\n\n\u21131-PhaseLift [18]\n\n\u01eb(cid:1)(cid:1)\nTable 2: Comparison of Algorithm 3 with \u21131-PhaseLift when x\u2217min = \u2126(cid:16)1/\u221ak(cid:17). Note that the\n\n\u01eb(cid:1)(cid:1) O k2 kn log n + log2 1\nO n3/\u01eb2(cid:1)\n\nO k k log n + log 1\nO k2 log n(cid:1)\n\ncomplexity of Algorithm 3 is dominated by the support \ufb01nding step. If k = O (1), Algorithm 3 runs\nin quasi-linear time.\n\n\u01eb log log 1\n\nSee Appendix A for a proof of the above lemma and how we use it to prove Theorem 4.2.\n\nCombining Theorems 4.1 and 4.2, and a simple observation that kxT \u2212 x\u2217k2 <ec dist xT, x\u2217(cid:1) for\na constantec, we can establish the correctness of Algorithm 2.\nvectors. For every \u03b7 > 0, there exists a constant c such that if m > cn log3 n + log 1\n\u01eb(cid:1)\nthen, with probability greater than 1 \u2212 \u03b7, Algorithm 2 outputs xt0 such that kxt0 \u2212 x\u2217k2 < \u01eb.\n\nTheorem 4.4. Suppose the measurement vectors in (1) are independent standard complex normal\n\u01eb log log 1\n\n5 Sparse Phase Retrieval\n\nIn this section, we consider the case where x\u2217 is known to be sparse, with sparsity k. A natural\nand practical question to ask here is: can the sample and computational complexity of the recovery\nalgorithm be improved when k \u226a n.\nRecently, [18] studied this problem for Gaussian A and showed that for \u21131 regularized PhaseLift,\nm = O(k2 log n) samples suf\ufb01ce for exact recovery of x\u2217. However, the computational complexity\nof this algorithm is still O(n3/\u01eb2).\n\nIn this section, we provide a simple extension of our AltMinPhase algorithm that we call SparseAlt-\nMinPhase, for the case of sparse x\u2217. The main idea behind our algorithm is to \ufb01rst recover the\nsupport of x\u2217. Then, the problem reduces to phase retrieval of a k-dimensional signal. We then\nsolve the reduced problem using Algorithm 2. The pseudocode for SparseAltMinPhase is presented\nin Algorithm 3. Table 2 provides a comparison of Algorithm 3 with \u21131-regularized PhaseLift in\nterms of sample complexity as well as computational complexity.\n\nThe following lemma shows that if the number of measurements is large enough, step 1 of SparseAlt-\nMinPhase recovers the support of x\u2217 correctly.\nLemma 5.1. Suppose x\u2217 is k-sparse with support S and kx\u2217k2 = 1. If ai are standard complex\nGaussian random vectors and m >\n\u03b4 , then Algorithm 3 recovers S with probability\ngreater than 1 \u2212 \u03b4, where x\u2217min is the minimum non-zero entry of x\u2217.\nThe key step of our proof is to show that if j \u2208 supp(x\u2217), then random variable Zij = Pi |aijyi|\nhas signi\ufb01cantly higher mean than for the case when j /\u2208 supp(x\u2217). Now, by applying appropriate\nconcentration bounds, we can ensure that minj\u2208supp(x\u2217) |Zij| > maxj /\u2208supp(x\u2217) |Zij| and hence our\nalgorithm never picks up an element outside the true support set supp(x\u2217). See Appendix B for a\ndetailed proof of the above lemma.\n\nc\nmin)4 log n\n(x\u2217\n\nThe correctness of Algorithm 3 now is a direct consequence of Lemma 5.1 and Theorem 4.4. For the\n\nspecial case where each non-zero value in x\u2217 is from {\u2212 1\u221ak\nCorollary 5.2. Suppose x\u2217 is k-sparse with non-zero elements \u00b1 1\u221ak\nm > c k2 log n\nprobability greater than 1 \u2212 \u03b4.\n\n, 1\u221ak}, we have the following corollary:\n\u01eb(cid:1), then Algorithm 3 will recover x\u2217 up to accuracy \u01eb with\n\n\u03b4 + k log2 k + k log 1\n\n. If the number of measurements\n\n7\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 2: (a) & (b): Sample and time complexity for successful recovery using random Gaussian\nillumination \ufb01lters. Similar to Figure 1, we observe that AltMinPhase has similar number of \ufb01lters\n(J ) as PhaseLift and PhaseCut, but is computationally much more ef\ufb01cient. We also see that Alt-\nMinPhase performs better than AltMin (randominit). (c): Recovery error kx \u2212 x\u2217k2 incurred by\nvarious methods with increasing amount of noise (\u03c3). AltMinPhase and PhaseCut perform compa-\nrably while PhaseLift incurs signi\ufb01cantly larger error.\n\n6 Experiments\n\nIn this section, we present experimental evaluation of AltMinPhase (Algorithm 1) and compare its\nperformance with the SDP based methods PhaseLift [6] and PhaseCut [28]. We also empirically\ndemonstrate the advantage of our initialization procedure over random initialization (denoted by\nAltMin (random init)), which has thus far been considered in the literature [13, 11, 28, 4]. AltMin\n(random init) is the same as AltMinPhase except that step 1 of Algorithm 1 is replaced with:x0 \u2190\nUniformly random vector from the unit sphere.\nWe \ufb01rst choose x\u2217 uniformly at random from the unit sphere. In the noiseless setting, a trial is said\nto succeed if the output x satis\ufb01es kx \u2212 x\u2217k2 < 10\u22122. For a given dimension, we do a linear search\nfor smallest m (number of samples) such that empirical success ratio over 20 runs is at least 0.8. We\nimplemented our methods in Matlab, while we obtained the code for PhaseLift and PhaseCut from\nthe authors of [22] and [28] respectively.\n\nWe now present results from our experiments in three different settings.\n\nIndependent Random Gaussian Measurements: Each measurement vector ai is generated from\nthe standard complex Gaussian distribution. This measurement scheme was \ufb01rst suggested by [6]\nand till date, this is the only scheme with theoretical guarantees.\n\nMultiple Random Illumination Filters: We now present our results for the setting where the mea-\nsurements are obtained using multiple illumination \ufb01lters; this setting was suggested by [4].\nIn\n\nparticular, choose J vectors z(1),\u00b7\u00b7\u00b7 , z(J) and compute the following discrete Fourier transforms:\n\nbx(u) = DFT(cid:16)x\u2217 \u00b7 \u2217 z(u)(cid:17) ,\n\nwhere \u00b7\u2217 denotes component-wise multiplication. Our measurements will then be the magnitudes of\ncomponents of the vectorsbx(1),\u00b7\u00b7\u00b7 ,bx(J). The above measurement scheme can be implemented by\nmodulating the light beam or by the use of masks; see [4] for more details.\n\nWe again perform the same experiments as in the previous setting. Figures 2 (a) and (b) present the\nresults. We again see that the measurement complexity of AltMinPhase is similar to that of PhaseCut\nand PhaseLift, but AltMinPhase is orders of magnitude faster than PhaseLift and PhaseCut.\n\nNoisy Phase Retrieval: Finally, we study our method in the following noisy measurement scheme:\n\nyi = |hai, x\u2217 + wii|\n\nfor i = 1, . . . , m,\n\n(5)\n\nwhere wi is the noise in the i-th measurement and is sampled from N (0, \u03c32). We \ufb01x n = 64\nand m = 6n. We then vary the amount of noise added \u03c3 and measure the \u21132 error in recovery,\ni.e., kx \u2212 x\u2217k2, where x is the recovered vector. Figure 2(c) compares the performance of various\n\nmethods with varying amount of noise. We observe that our method outperforms PhaseLift and has\nsimilar recovery error as PhaseCut.\n\nAcknowledgments\n\nS. Sanghavi would like to acknowledge support from NSF grants 0954059, 1302435, ARO grant\nW911NF-11-1-0265 and a DTRA YIP award.\n\n8\n\n\fReferences\n\n[1] J. Abrahams and A. Leslie. Methods used in the structure determination of bovine mitochondrial f1\n\natpase. Acta Crystallographica Section D: Biological Crystallography, 52(1):30\u201342, 1996.\n\n[2] H. H. Bauschke, P. L. Combettes, and D. R. Luke. Hybrid projection\u2013re\ufb02ection method for phase retrieval.\n\nJOSA A, 20(6):1025\u20131034, 2003.\n\n[3] L. Bregman. Finding the common point of convex sets by the method of successive projection.(russian).\n\nIn Dokl. Akad. Nauk SSSR, volume 162, pages 487\u2013490, 1965.\n\n[4] E. J. Candes, Y. C. Eldar, T. Strohmer, and V. Voroninski. Phase retrieval via matrix completion. SIAM\n\nJournal on Imaging Sciences, 6(1):199\u2013225, 2013.\n\n[5] E. J. Candes and X. Li. Solving quadratic equations via phaselift when there are about as many equations\n\nas unknowns. arXiv preprint arXiv:1208.6247, 2012.\n\n[6] E. J. Candes, T. Strohmer, and V. Voroninski. Phaselift: Exact and stable signal recovery from magnitude\n\nmeasurements via convex programming. Communications on Pure and Applied Mathematics, 2012.\n\n[7] A. Chai, M. Moscoso, and G. Papanicolaou. Array imaging using intensity-only measurements. Inverse\n\nProblems, 27(1):015005, 2011.\n\n[8] J. C. Dainty and J. R. Fienup. Phase retrieval and image reconstruction for astronomy. Image Recovery:\n\nTheory and Application, ed. byH. Stark, Academic Press, San Diego, pages 231\u2013275, 1987.\n\n[9] H. Duadi, O. Margalit, V. Mico, J. A. Rodrigo, T. Alieva, J. Garcia, and Z. Zalevsky. Digital holography\n\nand phase retrieval. Source: Holography, Research and Technologies. InTech, 2011.\n\n[10] V. Elser. Phase retrieval by iterated projections. JOSA A, 20(1):40\u201355, 2003.\n\n[11] J. R. Fienup et al. Phase retrieval algorithms: a comparison. Applied optics, 21(15):2758\u20132769, 1982.\n\n[12] D. Gabor. A new microscopic principle. Nature, 161(4098):777\u2013778, 1948.\n\n[13] R. W. Gerchberg and W. O. Saxton. A practical algorithm for the determination of phase from image and\n\ndiffraction plane pictures. Optik, 35:237, 1972.\n\n[14] N. E. Hurt. Phase Retrieval and Zero Crossings: Mathematical Methods in Image Reconstruction, vol-\n\nume 52. Kluwer Academic Print on Demand, 2001.\n\n[15] P. Jain, P. Netrapalli, and S. Sanghavi. Low-rank matrix completion using alternating minimization. arXiv\n\npreprint arXiv:1212.0467, 2012.\n\n[16] R. H. Keshavan. Ef\ufb01cient algorithms for collaborative \ufb01ltering. Phd Thesis, Stanford University, 2012.\n\n[17] W. V. Li and A. Wei. Gaussian integrals involving absolute value functions.\n\nIn Proceedings of the\n\nConference in Luminy, 2009.\n\n[18] X. Li and V. Voroninski. Sparse signal recovery from quadratic measurements via convex programming.\n\narXiv preprint arXiv:1209.4785, 2012.\n\n[19] S. Marchesini. Invited article: A uni\ufb01ed evaluation of iterative projection algorithms for phase retrieval.\n\nReview of Scienti\ufb01c Instruments, 78(1):011301\u2013011301, 2007.\n\n[20] J. Miao, P. Charalambous, J. Kirz, and D. Sayre. Extending the methodology of x-ray crystallography to\n\nallow imaging of micrometre-sized non-crystalline specimens. Nature, 400(6742):342\u2013344, 1999.\n\n[21] D. Misell. A method for the solution of the phase problem in electron microscopy. Journal of Physics D:\n\nApplied Physics, 6(1):L6, 1973.\n\n[22] H. Ohlsson, A. Y. Yang, R. Dong, and S. S. Sastry. Compressive phase retrieval from squared output\n\nmeasurements via semide\ufb01nite programming. arXiv preprint arXiv:1111.6323, 2011.\n\n[23] S. Oymak, A. Jalali, M. Fazel, Y. C. Eldar, and B. Hassibi. Simultaneously structured models with\n\napplication to sparse and low-rank matrices. arXiv preprint arXiv:1212.3753, 2012.\n\n[24] J. L. Sanz. Mathematical considerations for the problem of fourier transform phase retrieval from magni-\n\ntude. SIAM Journal on Applied Mathematics, 45(4):651\u2013664, 1985.\n\n[25] Y. Shechtman, Y. C. Eldar, A. Szameit, and M. Segev. Sparsity based sub-wavelength imaging with\n\npartially incoherent light via quadratic compressed sensing. arXiv preprint arXiv:1104.4406, 2011.\n\n[26] J. A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational\n\nMathematics, 12(4):389\u2013434, 2012.\n\n[27] R. Vershynin.\n\nIntroduction to the non-asymptotic analysis of random matrices.\n\narXiv preprint\n\narXiv:1011.3027, 2010.\n\n[28] I. Waldspurger, A. d\u2019Aspremont, and S. Mallat. Phase recovery, maxcut and complex semide\ufb01nite pro-\n\ngramming. arXiv preprint arXiv:1206.0102, 2012.\n\n[29] D. C. Youla and H. Webb. Image restoration by the method of convex projections: Part 1theory. Medical\n\nImaging, IEEE Transactions on, 1(2):81\u201394, 1982.\n\n9\n\n\f", "award": [], "sourceid": 1284, "authors": [{"given_name": "Praneeth", "family_name": "Netrapalli", "institution": "UT Austin"}, {"given_name": "Prateek", "family_name": "Jain", "institution": "Microsoft Research"}, {"given_name": "Sujay", "family_name": "Sanghavi", "institution": "UT Austin"}]}