{"title": "(Probably) Concave Graph Matching", "book": "Advances in Neural Information Processing Systems", "page_first": 408, "page_last": 418, "abstract": "In this paper we address the graph matching problem. Following the recent works of \\cite{zaslavskiy2009path,Vestner2017} we analyze and generalize the idea of concave relaxations. We introduce the concepts of \\emph{conditionally concave} and \\emph{probably conditionally concave} energies on polytopes and show that they encapsulate many instances of the graph matching problem, including matching Euclidean graphs and graphs on surfaces. We further prove that local minima of probably conditionally concave energies on general matching polytopes (\\eg, doubly stochastic) are with high probability extreme points of the matching polytope (\\eg, permutations).", "full_text": "(Probably) Concave Graph Matching\n\nHaggai Maron\n\nWeizmann Institute of Science\n\nRehovot, Israel\n\nYaron Lipman\n\nWeizmann Institute of Science\n\nRehovot, Israel\n\nhaggai.maron@weizmann.ac.il\n\nyaron.lipman@weizmann.ac.il\n\nAbstract\n\nIn this paper, we address the graph matching problem. Following the recent works\nof Zaslavskiy et al. (2009); Vestner et al. (2017) we analyze and generalize the\nidea of concave relaxations. We introduce the concepts of conditionally concave\nand probably conditionally concave energies on polytopes and show that they\nencapsulate many instances of the graph matching problem, including matching\nEuclidean graphs and graphs on surfaces. We further prove that local minima of\nprobably conditionally concave energies on general matching polytopes (e.g., dou-\nbly stochastic) are with high probability extreme points of the matching polytope\n(e.g., permutations).\n\n1\n\nIntroduction\n\nGraph matching is a generic and popular modeling tool for problems in computational sciences such\nas computer vision (Berg et al., 2005; Zhou and De la Torre, 2012; Rodola et al., 2013; Bernard et al.,\n2017), computer graphics (Funkhouser and Shilane, 2006; Kezurer et al., 2015), medical imaging\n(Guo et al., 2013), and machine learning (Umeyama, 1988; Huet et al., 1999; Cour et al., 2007). In\ngeneral, graph matching refers to several different optimization problems of the form:\n\ns.t. X \u2208 F\n\nE(X)\n\nmin\nX\n\n(1)\nwhere F \u2282 Rn\u00d7n0 is a collection of matchings between vertices of two graphs GA and GB, and\nE(X) = [X]T M [X] + aT [X] is usually a quadratic function in X \u2208 Rn\u00d7n0 ([X] \u2208 Rnn0\u00d71 is its\ncolumn stack). Often, M quanti\ufb01es the discrepancy between edge af\ufb01nities exerted by the matching\nX. Edge af\ufb01nities are represented by symmetric matrices A \u2208 Rn\u00d7n, B \u2208 Rn0\u00d7n0. Maybe the most\ncommon instantiation of (1) is\n\nE1(X) = (cid:107)AX \u2212 XB(cid:107)2\n\nF\n\n(2)\nand F = \u03a0n, the matrix group of n\u00d7 n permutations. The permutations X \u2208 \u03a0n represent bijections\nbetween the set of (n) vertices of GA and the set of (n) vertices of GB. We denote this problem as\nGM. From a computational point of view, this problem is equivalent to the quadratic assignment\nproblem, and as such is an NP-hard problem (Burkard et al., 1998). A popular way of obtaining\napproximate solutions is by relaxing its combinatorial constraints (Loiola et al., 2007).\nA standard relaxation of this formulation (e.g. Almohamad and Duffuaa (1993); A\ufb02alo et al. (2015);\nFiori and Sapiro (2015)) is achieved by replacing \u03a0n with its convex hull, namely the set of doubly-\n\nstochastic matrices DS = hull(F) = (cid:8)X \u2208 Rn\u00d7n | X1 = 1, X T 1 = 1, X \u2265 0(cid:9). The main ad-\n\nvantage of this formulation is the convexity of the energy E1; the main drawback is that often the\nminimizer is not a permutation and simply projecting the solution onto \u03a0n doesn\u2019t take the energy\ninto account resulting in a suboptimal solution. The prominent Path Following algorithm (Zaslavskiy\net al., 2009) suggests a better solution of continuously changing E1 to a concave energy E(cid:48) that\ncoincide (up to an additive constant) with E1 over the permutations. The concave energy E(cid:48) is called\nconcave relaxation and enjoys three key properties: (i) Its solution set is the same as the GM problem.\n(ii) Its set of local optima are all permutations. This means no projection of the local optima onto the\npermutations is required. (iii) For every descent direction, a maximal step is always guaranteed to\nreduce the energy most.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fDym et al. (2017); Bernard et al. (2017) suggest a similar strategy but starting with a tighter convex\nrelaxation. Another set of works (Vogelstein et al., 2015; Lyzinski et al., 2016; Vestner et al., 2017;\nBoyarski et al., 2017) have considered the energy\n\nE2(X) = \u2212tr(BX T AX)\n\n(3)\nover the doubly-stochastic matrices, DS, as well. Note that both energies E1, E2 are identical (up\nto an additive constant) over the permutations and hence both are considered relaxations. However,\nin contrast to E1, E2 is in general inde\ufb01nite, resulting in a non-convex relaxation. Vogelstein\net al. (2015); Lyzinski et al. (2016) suggest to locally optimize this relaxation with the Frank-Wolfe\nalgorithm and motivate it by proving that for the class of \u03c1-correlated Bernoulli adjacency matrices\nA, B, the optimal solution of the relaxation almost always coincides with the (unique in this case)\nGM optimal solution. Vestner et al. (2017); Boyarski et al. (2017) were the \ufb01rst to make the useful\nobservation that E2 is itself a concave relaxation for some important cases of af\ufb01nities such as heat\nkernels and Gaussians. This leads to an ef\ufb01cient local optimization using the Frank-Wolfe algorithm\nand specialized linear assignment solvers (e.g., Bernard et al. (2016)).\nIn this paper, we analyze and generalize the above works and introduce the concepts of conditionally\nconcave and probably conditionally concave energies E(X). Conditionally concave energy E(X)\nmeans that the restriction of the Hessian M of the energy E to the linear space\n\nlin(DS) =(cid:8)X \u2208 Rn\u00d7n | X1 = 0, X T 1 = 0(cid:9)\n\n(4)\nis negative de\ufb01nite. Note that lin(DS) is the linear part of the af\ufb01ne-hull of the doubly-stochastic\nmatrices, denoted a\ufb00(DS). We will use the notation M|lin(DS) to refer to this restriction of M, and\nconsequently M|lin(DS) \u227a 0 means vT M v < 0, for all 0 (cid:54)= v \u2208 lin(DS). Our \ufb01rst result is proving\nthere is a large class of af\ufb01nity matrices resulting in conditionally concave E2. In particular, af\ufb01nity\nmatrices constructed using positive or negative de\ufb01nite functions1 will be conditionally concave.\nTheorem 1. Let \u03a6 : Rd \u2192 R, \u03a8 : Rs \u2192 R be both conditionally positive (or negative) de\ufb01nite\nfunctions of order 1. For any pair of graphs with af\ufb01nity matrices A, B \u2208 Rn\u00d7n so that\n\nAij = \u03a6(xi \u2212 xj), Bij = \u03a8(yi \u2212 yj)\n\n(5)\nfor some arbitrary {xi}i\u2208[n] \u2282 Rd, {yi}i\u2208[n] \u2282 Rs, the energy E2(X) is conditionally concave, i.e.,\nits Hessian M|lin(DS) \u227a 0.\nOne useful application of this theorem is in matching graphs with Euclidean af\ufb01nities, since Euclidean\ndistances are conditionally negative de\ufb01nite of order 1 (Wendland, 2004). That is, the af\ufb01nities are\nEuclidean distances of points in Euclidean spaces of arbitrary dimensions,\n\nAij = (cid:107)xi \u2212 xj(cid:107)2 , Bij = (cid:107)yi \u2212 yj(cid:107)2 ,\n\n(6)\nwhere {xi}i\u2208[n] \u2282 Rd, {yi}i\u2208[n] \u2282 Rs. This class contains, besides Euclidean graphs, also af\ufb01nities\nmade out of distances that can be isometrically embedded in Euclidean spaces such as diffusion\ndistances (Coifman and Lafon, 2006), distances induced by deep learning embeddings (e.g. Schroff\net al. (2015)) and Mahalanobis distances. Furthermore, as shown in Bogomolny et al. (2007) the\nspherical distance, Aij = dSd (xi, xj), is also conditionally negative de\ufb01nite over the sphere and\ntherefore can be used in the context of the theorem as-well.\nSecond, we generalize the notion of conditionally concave energies to probably conditionally concave\nenergies. Intuitively, the energy E is called probably conditionally concave if it is rare to \ufb01nd a linear\nsubspace D of lin(DS) so that the restriction of E to it is convex, that is M|D (cid:23) 0. The primary\nmotivation in considering probably conditionally concave energies is that they enjoy (with high\nprobability) the same properties as the conditionally concave energies, i.e., (i)-(iii). Therefore, locally\nminimizing probably conditionally concave energies over F can be done also with the Frank-Wolfe\nalgorithm, with guarantees (in probability) on the feasibility of both the optimization result and the\nsolution set of this energy.\nA surprising fact we show is that probably conditionally concave energies are pretty common and\ninclude Hessian matrices M with almost the same ratio of positive to negative eigenvalues. The\n\n1In a nutshell, positive (negative) de\ufb01nite functions are functions that when applied to differences of vectors\nproduce positive (negative) de\ufb01nite matrices when restricted to certain linear subspaces; this notion will be\nformally introduced and de\ufb01ned in Section 2.\n\n2\n\n\ffollowing theorem bounds the probability of \ufb01nding uniformly at random a linear subspace D such\nthat the restriction of M \u2208 Rm\u00d7m to D is convex, i.e., M|D (cid:31) 0. The set of d-dimensional linear\nsubspaces of Rm is called the Grassmannian Gr(d, m) and it has a compact differential manifold\nstructure and a uniform measure Pr.\nTheorem 2. Let M \u2208 Rm\u00d7m be a symmetric matrix with eigenvalues \u03bb1, . . . , \u03bbm. Then, for all\nt \u2208 (0,\n\n):\n\n1\n\n2\u03bbmax\n\nPr(M|D (cid:23) 0) \u2264 m(cid:89)\n\n(1 \u2212 2t\u03bbi)\u2212 d\n2 ,\n\ni=1\n\n(7)\nwhere M|D is the restriction of M to the d-dimensional linear subspace de\ufb01ned by D \u2208 Gr(d, m)\nand the probability is taken with respect to the Haar probability measure on Gr(d, m).\nFor the case d = 1 the probability of M|D (cid:23) 0 can be interpreted via distributions of quadratic forms.\nPrevious works aimed at calculating and bounding similar probabilities (Imhof, 1961; Rudelson et al.,\n2013) but in different (more general) settings providing less explicit bounds. As we will see, the case\nd > 1 quanti\ufb01es the chances of local minima residing at high dimensional faces of hull(F).\nAs a simple use-case of theorem 2, consider a matrix where 51% of the eigenvalues are \u22121 and 49%\nare +1; the probability of \ufb01nding a convex direction of this matrix, when the direction is uniformly\ndistributed, is exponentially low in the dimension of the matrix. As we (empirically) show, one class\nof problems that in practice presents probably conditionally concave E2 are when the af\ufb01nities A, B\ndescribe geodesic distances on surfaces.\nProbable concavity can be further used to prove theorems regarding the likelihood of \ufb01nding a local\nminimum outside the matching set F when minimizing E over a relaxed matching polytope hull(F).\nWe will show the existence of a rather general probability space (in fact, a family) (\u2126m, Pr) of\nHessians M \u2208 Rm\u00d7m \u2208 \u2126m with a natural probability measure, Pr, so that the probability of local\nminima of E(X) to be outside F is very small. This result is stated and proved in theorem 3. An\nimmediate conclusion of this result provides a proof of a probabilistic version of properties (i) and (ii)\nstated above for energies drawn from this distribution. In particular, the global minima of E(X) over\nDS coincide with those over \u03a0n with high probability. The following theorem provides a general\nresult in the \ufb02avor of Lyzinski et al. (2016) for a large class of quadratic energies.\nTheorem 4. Let E be a quadratic energy with Hessian drawn from the probability space (\u2126m, Pr).\nThe chance that a local minimum of minX\u2208DS E(X) is outside \u03a0n is extremely small, bounded by\nexp(\u2212c1n2), for some constant c1 > 0.\nThird, when the energy of interest E(X) is not probably conditionally concave over lin(F) there is\nno guarantee that the local optimum of E over hull(F) is in F. We devise a simple variant of the\nFrank-Wolfe algorithm, replacing the standard line search with a concave search. Concave search\nmeans subtracting from the energy E convex parts that are constant on F (i.e., relaxations) until an\nenergy reducing step is found.\n\n2 Conditionally concave energies\nWe are interested in the application of the Frank-Wolfe algorithm Frank and Wolfe (1956) for locally\noptimizing E2 (potentially with a linear term) from (3) over the doubly-stochastic matrices:\n\n(8a)\n\nE(X)\n\nmin\nX\ns.t. X \u2208 DS\n\n(8b)\nwhere E(X) = \u2212[X]T (B \u2297 A)[X] + aT [X]. For completeness, we include a simple pseudo-code:\ninput :X0 \u2208 hull(F)\nwhile not converged do\n\ncompute step: X1 = minX\u2208DS \u22122[X0]T (B \u2297 A)[X] + aT [X];\nline-search: t0 = argmint\u2208[0,1]E((1 \u2212 t)X0 + tX1) ;\napply step: X0 = (1 \u2212 t0)X0 + t0X1 ;\n\nend\n\nAlgorithm 1: Frank-Wolfe algorithm.\n\nDe\ufb01nition 1. We say that E(X) is conditionally concave if it is concave when restricted to the linear\nspace lin(F), the linear part of the af\ufb01ne-hull hull(F).\n\n3\n\n\fIf E(X) is conditionally concave we have that properties (i)-(iii) of concave relaxations detailed\nabove hold. In particular Algorithm 1 would always accept t0 = 1 as the optimal step, and therefore it\nwill produce a series of feasible matchings X0 \u2208 \u03a0n and will converge after a \ufb01nite number of steps\nto a permutation local minimum X\u2217 \u2208 \u03a0n of (8). Our \ufb01rst result in this paper provides suf\ufb01cient\ncondition for W = \u2212B \u2297 A to be concave. It provides a connection between conditionally positive\n(or negative) de\ufb01nite functions (Wendland, 2004), and negative de\ufb01niteness of \u2212B \u2297 A:\nDe\ufb01nition 2. A function \u03a6 : Rd \u2192 R is called conditionally positive de\ufb01nite of order m if for all\ni\u2208[n] \u03b7ip(xi) = 0 for all\n\npairwise distinct points {xi}i\u2208[n] \u2282 Rd and all 0 (cid:54)= \u03b7 \u2208 Rn satisfying(cid:80)\nd-variate polynomials p of degree less than m, we have(cid:80)n\nRd and zero-sum vectors 0 (cid:54)= \u03b7 \u2208 Rd we have(cid:80)n\n\nij=1 \u03b7i \u00af\u03b7j\u03a6(xi \u2212 xj) > 0.\n\nSpeci\ufb01cally, \u03a6 is conditionally positive de\ufb01nite of order 1 if for all pairwise distinct points {xi}i\u2208[n] \u2282\nij=1 \u03b7i \u00af\u03b7j\u03a6(xi \u2212 xj) > 0. Conditionally negative\nde\ufb01niteness is de\ufb01ned analogously. Some well-known functions satisfy the above conditions, for\n2)\u03b2 for \u03b2 \u2208 (0, 1] are conditionally positive de\ufb01nite of order 1, while\nexample: \u2212(cid:107)x(cid:107)2, \u2212 (c2 + (cid:107)x(cid:107)2\nthe functions exp(\u2212\u03c4 2(cid:107)x(cid:107)2\n2)+ are conditionally positive de\ufb01nite of\norder 0 (also called just positive de\ufb01nite functions). Note that if \u03a6 is conditionally positive de\ufb01nite\nof order m, it is also conditionally positive de\ufb01nite of any order m(cid:48) > m. Lastly, as shown in\nBogomolny et al. (2007), spherical distances \u2212d(x, x(cid:48))\u03b3 are conditionally positive semide\ufb01nite for\n\u03b3 \u2208 (0, 1], and exp(\u2212\u03c4 2d(x, x(cid:48))\u03b3) are positive de\ufb01nite for \u03b3 \u2208 (0, 1] and all \u03c4. We now prove:\nTheorem 1. Let \u03a6 : Rd \u2192 R, \u03a8 : Rs \u2192 R be both conditionally positive (or negative) de\ufb01nite\nfunctions of order 1. For any pair of graphs with af\ufb01nity matrices A, B \u2208 Rn\u00d7n so that\n\n2) for all \u03c4, and c30 = (1 \u2212 (cid:107)x(cid:107)2\n\nAij = \u03a6(xi \u2212 xj), Bij = \u03a8(yi \u2212 yj)\n\n(9)\nfor some arbitrary {xi}i\u2208[n] \u2282 Rd, {yi}i\u2208[n] \u2282 Rs, the energy E2(X) is conditionally concave, i.e.,\nits Hessian M|lin(DS) \u227a 0.\nLemma 1 (orthonormal basis for lin(DS)). If the columns of F \u2208 Rn\u00d7(n\u22121) constitute an or-\n\nthonormal basis for the linear space 1\u22a5 =(cid:8)x \u2208 Rn | xT 1 = 0(cid:9) then the columns of F \u2297 F are an\n\northonormal basis for lin(DS).\nProof. First, (F\u2297F )T (F\u2297F ) = (F T \u2297F T )(F\u2297F ) = (F T F )\u2297(F T F ) = In\u22121\u2297In\u22121 = I(n\u22121)2.\nTherefore F \u2297 F is full rank with (n\u2212 1)2 orthonormal columns. Any column of F \u2297 F is of the form\nFi \u2297 Fj, where Fi, Fj are the ith and jth columns of F , respectively. Now, reshaping Fi \u2297 Fj back\ninto an n \u00d7 n matrix using the inverse of the bracket operation we get X =]Fi \u2297 Fj[= FjF T\ni which\nare clearly in lin(DS). Lastly, since the dimension of lin(DS) is (n \u2212 1)2 the lemma is proved.\nProof. (of Theorem 1 ) Let A, B \u2208 Rn\u00d7n be as in the theorem statement. Checking that E(X)\nis conditionally concave amounts to restricting the quadratic form \u2212[X]T (B \u2297 A)[X] to lin(DS):\n\u2212(F \u2297 F )T (B \u2297 A)(F \u2297 F ) = \u2212(F T BF ) \u2297 (F T AF ) \u227a 0, where we used Lemma 1 and the fact\nthat \u03a6, \u03a8 are conditionally positive de\ufb01nite of order 1.\nCorollary 1. Let A, B be Euclidean distance matrices then the solution set of Problem (8) and GM\ncoincide.\n\n3 Probably conditionally concave energies\n\nAlthough Theorem 1 covers a rather wide spectrum of instantiations of Problem (8) it de\ufb01nitely does\nnot cover all interesting scenarios. In this section we would like to consider a more general energy\nE(X) = [X]T M [X] + aT [X], X \u2208 Rn\u00d7n, M \u2208 Rn2\u00d7n2 and the optimization problem:\n\n(10a)\n\nE(X)\n\nmin\nX\ns.t. X \u2208 hull(F)\n\n(10b)\nWe assume that F = ext(hull(F)), namely, the matchings are extreme points of their convex hull\n(as happens e.g., for permutations F = \u03a0n). When the restricted Hessians M|lin(F ) are \u0001\u2212negative\nde\ufb01nite (to be de\ufb01ned soon) we will call E(X) probably conditionally concave.\n\n4\n\n\fProbably conditionally concave energies E(X) will possess properties (i)-(iii) of conditionally\nconcave energies with high probability. Hence they allow using Frank-Wolfe algorithms, such as\nAlgorithm 1, with no line search (t0 = 1) and achieve local minima in F (no post-processing is\nrequired). In addition, we prove that certain classes of probably conditionally concave relaxations\nhave no local minima that are outside F, with high probability. In the experiment section we will also\ndemonstrate that in practice this algorithm works well for different choices of probably conditionally\nconcave energies. Popular energies that fall into this category are, for example, (3) with A, B geodesic\ndistance matrices or certain functions thereof.\nWe \ufb01rst make some preparations. Recall the de\ufb01nition of the Grassmannian Gr(d, m): It is the set of\nd-dimensional linear subspaces in Rm; it is a compact differential manifold de\ufb01ned by the quotient\nO(m)/O(d) \u00d7 O(m \u2212 d), where O(s) is the orthogonal group in Rs. The orthogonal group O(m)\nacts transitively on Gr(d, m) by taking an orthogonal basis of any d-dimensional linear subspace\nto an orthogonal basis of a possibly different d-dimensional subspace. On O(m) there exists Haar\nprobability measure, that is a probability measure invariant to actions of O(m). The Haar probability\nmeasure on O(m) induces an O(m)-invariant (which we will also call Haar) probability measure on\nG(k, m). We now introduce the notion of \u0001-negative de\ufb01nite matrices:\nDe\ufb01nition 3. A symmetric matrix M \u2208 Rm\u00d7m is called \u0001-negative de\ufb01nite if the probability of\n\ufb01nding a d-dimensional linear subspace D \u2208 G(d, m) so that A is convex over D is smaller than \u0001d.\nThat is, Pr({M|D (cid:23) 0}) \u2264 \u0001d where the probability is taken with respect to a Haar O(m)-invariant\nmeasure on the Grassmannian Gr(d, m).\nOne way to interpret M|D, the restriction of the matrix M to the linear subspace D, is to consider a\nmatrix F \u2208 Rm\u00d7d where the columns of F form a basis to D and consider M|D = F T M F . Clearly,\nnegative de\ufb01nite matrices are \u0001-negative de\ufb01nite for all \u0001 > 0. The following theorem helps to see\nwhat else this de\ufb01nition encapsulates:\nTheorem 2. Let M \u2208 Rm\u00d7m be a symmetric matrix with eigenvalues \u03bb1, . . . , \u03bbm. Then, for all\nt \u2208 (0,\n\n):\n\n1\n\n2\u03bbmax\n\ni=1\n\n(11)\nwhere M|D is the restriction of M to the d-dimensional linear subspace de\ufb01ned by D \u2208 Gr(d, m)\nand the probability is taken with respect to the Haar probability measure on Gr(d, m).\nProof. Let F be an m \u00d7 d matrix of i.i.d. standard normal random variables N (0, 1). Let Fj, j \u2208 [d],\ndenote the jth column of F . The multivariate distribution of F is O(m)-invariant in the sense that\nfor a subset A \u2282 Rm\u00d7d, Pr(RA) = Pr(A) for all R \u2208 O(m). Therefore, Pr(M|D (cid:23) 0) =\nPr(F T M F (cid:23) 0). Next, Pr(F T M F (cid:23) 0) \u2264 Pr(\u2229d\nj M Fj \u2265\n0), where the inequality is due to the fact that a positive semide\ufb01nite matrix necessarily has non-\nnegative diagonal, and the equality is due to the independence of the random variables F T\nj M Fj,\n1 M F1) which is the same for all columns j \u2208 [d].\nj \u2208 [d]. We now calculate the probability Pr(F T\nFor brevity let X = (X1, X2, . . . , Xm)T = F1. Let M = U \u039bU T , where U \u2208 O(m) and \u039b =\ndiag(\u03bb1, \u03bb2, . . . , \u03bbm) be the spectral decomposition of M. Since U X has the same distribution as\ni \u223c \u03c72(1)\nwe have transformed the problem into a non-negativity test of a linear combination of chi-squared\nrandom variables. Using the Chernoff bound we have for all t > 0:\n\nX we have that Pr(X T M X \u2265 0) = Pr(X T \u039bX \u2265 0) = Pr((cid:80)m\n(cid:17)\nm(cid:89)\n\n(cid:8)F T\nj M Fj \u2265 0(cid:9)) =(cid:81)d\n\ni \u2265 0). Since X 2\n\n(cid:32) m(cid:88)\n\nj=1 Pr(F T\n\ni=1 \u03bbiX 2\n\n(cid:105)\n\nj=1\n\n\u2264 E(cid:16)\n\net(cid:80)m\n\n(cid:33)\ni \u2265 0\n\nE(cid:104)\n\nPr\n\n\u03bbiX 2\n\ni=1 \u03bbiX 2\n\ni\n\n=\n\net\u03bbiX 2\ni ,\n\nPr(M|D (cid:23) 0) \u2264 m(cid:89)\n\n(1 \u2212 2t\u03bbi)\u2212 d\n2 ,\n\nthat E(cid:104)\n\n(cid:105)\n\ni=1\n\ni=1\n\ni\n\net\u03bbiX 2\n\nis the moment generating function of the random variable X 2\n\nwhere the last equality follows from the independence of X1, ..., Xm. To \ufb01nish the proof we note\ni sampled at t\u03bbi which\nwhen \u03bbi (cid:54)= 0 and\n\nis known to be (1 \u2212 2t\u03bbi)\u22121/2 for t\u03bbi < 1\ndisregard all \u03bbi = 0.\nTheorem 2 shows that there is a concentration of measure phenomenon when the dimension m of the\nmatrix M increases. For example consider\n\n2 which means that we can take t < 1\n2\u03bbi\n\n(cid:125)(cid:124)\n\u039bm,p =(cid:0) (1\u2212p)m\n\n(cid:122)\n\n\u03bb1, \u03bb2, . . .,\n\n(cid:123)\n\n\u00b51, \u00b52, . . .(cid:1),\n(cid:122)\n(cid:123)\n(cid:125)(cid:124)\n\npm\n\n5\n\n(12)\n\n\fwhere \u03bbi \u2264 \u2212b, b > 0 are the negative eigenvalues; 0 \u2264 \u00b5i \u2264 a, a > 0 are the positive eigenvalues\nand the ratio of positive to negative eigenvalues is a constant p \u2208 (0, 1/2). We can bound the r.h.s. of\n(11) with (1 + 2bt)\u2212 (1\u2212p)m\n2 . Elementary calculus shows that the minimum of this\nfunction over t \u2208 (0, 1/2a) gives:\n\n(1 \u2212 2at)\u2212 pm\n\n2\n\nPr(vtM v \u2265 0) \u2264\n\na1\u2212pbp\n\na+b\n\n2\n\n1\n2\n\n(1 \u2212 p)p\u22121p\u2212p\n\n,\n\n(13)\n\n(cid:32)\n\n(cid:33) m\n\n2\n\n1\n\nwhere v is uniformly distributed on the unit sphere in Rm. The function\n2 (1 \u2212 p)p\u22121p\u2212p is shown in the inset and for p < 1/2 it is strictly smaller\nthan 1. The term a1\u2212pbp\n(a+b)/2 is the ratio of the weighted geometric mean and the\narithmetic mean. Using the weighted arithmetic-geometric inequality it can\nbe shown that these terms is at-most 1 if a \u2264 b. To summarize, if a \u2264 b and\np < 1/2 the probability to \ufb01nd a convex (positive) direction in M is exponentially decreasing in m,\nthe dimension of the matrix. One simple example is taking a = b = 1, p = 0.49 which shows that\nconsidering the matrices\n\n(cid:122)\n\nU(cid:0)\n\n(cid:125)(cid:124)\n\n0.51m\n\n\u22121,\u22121, . . . ,\u22121,\n\n(cid:123)\n\n(cid:122)\n(cid:125)(cid:124)\n(cid:123)\n1, 1, . . . , 1(cid:1)U T\n\n0.49m\n\nit will be extremely hard to get in random a convex direction in dimension m \u2248 3002, i.e., the\nprobability will be \u2248 4 \u00b7 10\u22125 (this is a low dimension for a matching problem where m = (n \u2212 1)2).\nAnother consequence that comes out of this theorem (in fact, its proof) is that the probability of\n\ufb01nding a linear subspace D \u2208 Gr(d, m) for which the matrix M is positive semide\ufb01nite is bounded\nby the probability of \ufb01nding a one-dimensional subspace D1 \u2208 Gr(1, m) to the power of d. Therefore\nthe d exponent in De\ufb01nition 3 makes sense. Namely, to show a symmetric matrix M is \u0001-negative\nde\ufb01nite it is enough to check one-dimensional linear subspaces. An important implication of this fact\nand one of the motivations for De\ufb01nition 3 is that \ufb01nding local minima at high dimensional faces of\nthe polytope hull(F) is much less likely than at low dimensional faces.\nNext, we would like to prove Theorem 3 that shows that for natural probability space of Hessians\n{M} the local minima of (10) are with high probability in F, e.g., permutations in case that F = \u03a0n.\nWe therefore need to devise a natural probability space of Hessians. We opt to consider Hessians of\nthe form discussed above, namely\n\n\u2126m =(cid:8)U \u039bm,pU T | U \u2208 O(m)(cid:9) ,\n\n(14)\nwhere \u039bm,p is de\ufb01ned in (12). The probability measure over \u2126m is de\ufb01ned using the\nthat is for a subset A \u2282 \u2126m we de\ufb01ne P r(A) =\nHaar probability measure on O(m),\n\nP r((cid:8)U \u2208 O(m) | U \u039bm,pU T \u2208 A(cid:9)), where the probability measure on the r.h.s. is the proba-\n\nbility Haar measure on O(m). Note that (14) is plausible since the input graphs GA, GB are\nusually provided with an arbitrary ordering of the vertices. Writing the quadratic energy E re-\nsulted from a different ordering P, Q \u2208 \u03a0n of the vertices of GA, GB (resp.) yields the Hessian\nH(cid:48) = (Q \u2297 P )(B \u2297 A)(Q \u2297 P )T , where Q \u2297 P \u2208 \u03a0m \u2282 O(m). This motivates de\ufb01ning a Hessian\nprobability space that is invariant to O(m). We prove:\nTheorem 3. If the number of extreme points of the polytope hull(F ) is bounded by exp(m1\u2212\u0001), for\nsome \ufb01xed arbitrary \u0001 > 0, and the Hessian of E is drawn from the probability space (\u2126m, Pr), the\nchance that a local minimum of minX\u2208hull(F ) E(X) is outside F is extremely small, bounded by\nexp(\u2212c1m), for some constant c1 > 0.\nProof. Denote all the edges (i.e., one-dimensional faces) of the polytope P = hull(F) by indices \u03b1.\nEven if every two extreme points of P are connected by an edge there could be at most exp(2m1\u2212\u0001)\nedges. A local minimum X\u2217 \u2208 P to (10) that is not in F necessarily lies in the (relative) interior of\nsome face f of P of dimension at-least one. The restriction of the Hessian M of E(X) to lin(f ) is\ntherefore necessarily positive semide\ufb01nite. This implies there is a direction v\u03b1 \u2208 Rm, parallel to an\nedge \u03b1 of P so that vT\n\u03b1 M v\u03b1 \u2265 0 and zero otherwise.\nLet us denote by X\u03b1 the indicator random variable that equals one if vT\n\u03b1 X\u03b1 the random\n\u03b1 M v\u03b1 \u2265\n\u03b1 Pr(vT\n\nIf X\u03b1 = 1 we say that the edge \u03b1 is a critical edge for M. Let us denote X =(cid:80)\nvariable counting critical edges. The expected number of critical edges is E(X) =(cid:80)\n\n\u03b1 M v\u03b1 \u2265 0.\n\n0). We use Theorem 2, in particular (13), to bound the summands.\n\n6\n\n00.10.20.30.40.5p0.50.60.70.80.91\f\u03b1 M v\u03b1 \u2265 0) = Pr(vT\n\nand therefore E(X) \u2264 exp(m log \u03b7/2)(cid:80)\n\n\u03b1 U \u039bm,pU T v\u03b1 \u2265 0) and U T v\u03b1 is distributed uniformly on the\nSince Pr(vT\n\u03b1 M v\u03b1 \u2265 0) \u2264 \u03b7m/2 for some \u03b7 \u2208 [0, 1)\nunit sphere in Rm, we can use (13) to infer that Pr(vT\n\u03b1 1 (note that log \u03b7 < 0). Incorporating the bound on\nedge number in P discussed above we get E(X) \u2264 exp( log \u03b7\n2 m + 2m1\u2212\u0001) \u2264 exp(\u2212c1m) for some\nconstant c1 > 0. Lastly, as explained above, the event of a local minimum not in F is contained in\nX \u2265 1 and by Markov\u2019s inequality we \ufb01nally get Pr(X \u2265 1) \u2264 E(X) \u2264 exp(\u2212c1m).\nLet us use this theorem to show that the local optimal solutions to Problem (10) with permutations as\nmatchings, F = \u03a0n, are with high probability permutations:\nTheorem 4. Let E be a quadratic energy with Hessian drawn from the probability space (\u2126m, Pr).\nThe chance that a local minimum of minX\u2208DS E(X) is outside \u03a0n is extremely small, bounded by\nexp(\u2212c1n2), for some constant c1 > 0.\nProof. In this case the polytope DS = hull(\u03a0n) is in the (n \u2212 1)2 dimensional linear subspace\nlin(DS) of Rn\u00d7n. It therefore makes sense to consider the Hessians\u2019 probability space restricted to\nlin(DS), that is considering M|lin(DS) and the orthogonal subgroup acting on it, O((n \u2212 1)2). In this\ncase m = (n \u2212 1)2. The number of vertices of DS is the number of permutations which by Stirling\u2019s\nbound we have n! \u2264 exp(1 \u2212 n + log n(n + 1/2)) \u2264 exp((n \u2212 1)1.1). Hence the number of edges\nis bounded by exp(2(n \u2212 1)1.1), as required.\nLastly, Theorems 3 and 4, can be generalized by considering d-dimensional faces of the polytope:\nTheorem 5. If the number of extreme points of the polytope hull(F ) is bounded by exp(m1\u2212\u0001), for\nsome \ufb01xed arbitrary \u0001 > 0, and the Hessian of E is drawn from the probability space (\u2126m, Pr), the\nchance that a local minimum of minX\u2208hull(F ) E(X) is in the relative interior of a d-dimensional\nface of hull(F) is extremely small, bounded by exp(\u2212c1dm), for some constant c1 > 0.\nThis theorem is proved similarly to Theorem 3 by considering indicator variables X\u03b1 for positive\nsemide\ufb01nite M|lin(\u03b1) where \u03b1 stands for a d-dimensional face in hull(F). This generalized theorem\nhas a practical implication: local minima are likely to be found on lower dimensional faces.\n\n4 Graph matching with one sided permutations\nIn this section we examine an interesting and popular graph matching (1) instance, where the\n\nmatchings are the one-sided permutations, namely F =(cid:8)X \u2208 {0, 1}n\u00d7n0 | X1 = 1(cid:9). That is F are\n\nwell-de\ufb01ned maps from graph GA with n vertices to GB with n0 vertices. This modeling is used in\nthe template and partial matching cases. Unfortunately, in this case, standard graph matching energies\nE(X) are not probably conditionally concave over lin(F). Note that lin(DS) (cid:36) lin(F).\nWe devise a variation of the Frank-Wolfe algorithm using a concave search procedure. That is, in\neach iteration, instead of standard line search we subtract a convex energy from E(X) that is constant\non F until we \ufb01nd a descent step. This subtraction is a relaxation of the original problem (1) in the\nsense it does not alter (up to a global constant) the energy values at F.\nThe algorithm is summarized in Algorithm 2 and is guaranteed to output a feasible solution in F. The\nlinear program in each iteration over hull(F) has a simple closed form solution. Also, note that in the\ninner loop only n different \u03bb values should be checked. Details can be found in the supplementary\nmaterials.\n\ninput :X0 \u2208 hull(F)\nwhile not converged do\n\nwhile energy not reduced do\n\nadd concave energy Mcurr = M \u2212 \u03bb\u039b;\ncompute step: X1 = minX\u2208hull(F )[X0]T Mcurr[X];\nincrease \u03bb;\n\nend\nUpdate current solution X0 = X1 and set \u03bb = 0;\n\nend\n\nAlgorithm 2: Frank-Wolfe with a concave search.\n\n7\n\n\f(a)\n\n(b)\n\nFigure 1: (a) SHREC07 benchmark: Cumulative distribution functions of all errors (left) and mean\nerror per shape (right). (b) Anatomical dataset embedding in the plane. Squares and triangles\nrepresent different bone types, lines represent temporal trajectories.\n\n5 Experiments\nBound evaluation: Table 1 evaluates the probability bound (11) for Hessians M \u2208 R1002\u00d71002 of\nE2(X) using af\ufb01nities A, B de\ufb01ned by functions of geodesic distances on surfaces. Functions that\nare conditionally negative de\ufb01nite or semi-de\ufb01nite in the Euclidean case: geodesic distances d(x, y),\nits square d(x, y)2, and multi-quadratic functions (1 + d(x, y)2) 1\n10 . Functions that are positive\nde\ufb01nite in the Euclidean case: c30((cid:107)x(cid:107)2) = (1 \u2212 (cid:107)x(cid:107)2)+, c31((cid:107)x(cid:107)2) = (1 \u2212 (cid:107)x(cid:107)2)4\n+(4(cid:107)x(cid:107)2 + 1)\nand exp(\u2212\u03c4 2(cid:107)x(cid:107)2\n2) (note that the last function was used in Vestner et al. (2017)). We also provide\nthe empirical chance of sampling a convex direction. The results in the table are the mean over all the\nshape pairs (218) in the SHREC07 (Giorgi et al., 2007) shape matching benchmark with n = 100.\nThe empirical test was conducted using 106 random directions sampled from an i.i.d. Gaussian\ndistribution. Note that 0 in the table means numerical zero (below machine precision).\n\nTable 1: Evaluation of probable conditional concavity for different functions of geodesics on lin(DS).\n\nBound mean\nBound std\nEmpirical mean\nEmpirical std\n\nDistance\n\n0\n0\n0\n0\n\nDistance Squared MultiQuadratic\n7 \u00b7 10\u22124\n1.7 \u00b7 10\u22123\n7 \u00b7 10\u22125\n1.8 \u00b7 10\u22124\n\n0.024\n0.021\n0.003\n0.003\n\nc30\n\nc31\n\nGaussian\n\n0\n0\n0\n0\n\n0\n0\n0\n0\n\n0\n0\n0\n0\n\nInitialization: Motivated by Fischler and Bolles (1987); Kim et al. (2011) and due to the fast\nrunning time of the algorithms (e.g., 150msec for n = 200 with Algorithm 1, and 16sec with\nAlgorithm 2, both on a single CPU) we sampled multiple initializations based on randomized l-pairs\nof vertices of graphs GA, GB and choose the result corresponding to the best energy. In Algorithm 1\nwe used the Auction algorithm (Bernard et al., 2016), as in Vestner et al. (2017).\n\nTable 2: Comparison to \"convex to concave\" methods. The table shows the average and the std of the\nenergy differences. Positive averages indicate our algorithm achieves lower energy on average.\n\n# points\n\n30\n\nDSPP\nPATH\n\n5.0\u00b1 5.3\n101.4\u00b153.9\nRANDOM 197.9\u00b135.2\n\nModelNet10\n\n60\n\n9.8\u00b1 10.8\n512.3\u00b1198.4\n865.3\u00b1122.1\n\n90\n\n14.468\u00b1 19.8\n1251.9\u00b1426.4\n1986.1\u00b1273.0\n\n30\n\n1.3\u00b1 2.3\n69.263\u00b155.9\n120.2\u00b183.6\n\nSHREC07\n\n60\n\n9.5\u00b1 9.5\n307.7\u00b1230.6\n532.7\u00b1357.8\n\n90\n\n26.2\u00b1 24.3\n721.0\u00b1549.7\n1230.7\u00b1817.6\n\nComparison with convex-to-concave methods: Table 2 compares our method to Zaslavskiy et al.\n(2009); Dym et al. (2017) (PATH, DSPP accordingly). As mentioned in the introduction, these\nmethods solve convex relaxations and then project its minimizer while deforming the energy towards\nconcavity. Our method compares favorably in the task of matching point-clouds from the ModelNet10\ndataset (Wu et al., 2015) with Euclidean distances as af\ufb01nities, and the SHREC07 dataset (Giorgi\net al., 2007) with geodesic distances. We used F = \u03a0n, and energy (3). The table shows average and\nstandard deviation of energy differences of the listed algorithms and ours; the average is taken over\n50 random pairs of shapes. Note that positive averages mean our algorithm achieves lower energy on\naverage; the difference to random energy values is given for scale.\n\n8\n\n00.050.10.150.20.2500.20.40.60.81BIMGeodesicC_30Gaussian Heat kernelC_311-sided 00.050.10.150.20.2500.20.40.60.81TrajectoryHumerus BoneTibia Bone\fAutomatic shape matching: We use our Algorithm 1 for automatic shape matching (i.e., with no\nuser input or input shape features) on a the SHREC07 (Giorgi et al., 2007) dataset according to the\nprotocol of Kim et al. (2011). This benchmark consists of matching 218 pairs of (often extremely)\nnon-isometric shapes in 11 different classes such as humans, animals, planes, ants etc. On each shape,\nwe sampled k = 8 points using farthest point sampling and randomized s = 2000 initializations of\nsubsets of l = 3 points. In this stage, we use n = 300 points. We then up-sampled to n = 1500\nusing the exact algorithm with initialization using our n = 300 best result. The process takes about\n16min per pair running on a single CPU. Figure 1 (a) shows the cumulative distribution function\nof the geodesic matching errors (left - all errors, right - mean error per pair) of Algorithm 1 with\ngeodesic distances and their functions c30, c31. We used (3) and F = \u03a0. We also show the result\nof Algorithm 2 with geodesic distances, see details in the supplementary materials. We compare\nwith Blended Intrinsic Maps (BIM) (Kim et al., 2011) and the energies suggested by Boyarski et al.\n(2017) (heat kernel) and Vestner et al. (2017) (Gaussian of geodesics). For the latter two, we used the\nsame procedure as described above and just replaced the energies with the ones suggested in these\nworks. Note that the Gaussian of geodesics energy of Vestner et al. (2017) falls into the probably\nconcave framework.\n\nto additive constant) and de\ufb01ned by(cid:80)\n\nAnatomical shape space analysis: We match a dataset of 67 mice bone surfaces acquired us-\ning micro-CT. The dataset consists of eight time series. Each time series captures the devel-\nopment of one type of bone over time. We use Algorithm 1 to match all pairs in the dataset\nusing Euclidean distance af\ufb01nity matrices A, B, energy (3), and F =\n\u03a0n. After optimization, we calculated a 67 \u00d7 67 dissimilarity matrix.\nDissimilarities are equivalent to our energy over the permutations (up\nijkl XijXkl(dik \u2212 djl)2. A color-\ncoded matching example can be seen in the inset. In Figure 1 (b) we used\nMulti-Dimensional Scaling (MDS) (Kruskal and Wish, 1978) to assign\na 2D coordinate to each surface using the dissimilarity matrix. Each\nbone is shown as a trajectory. Note how the embedding separated the two\ntypes of bones and all bones of the same type are mapped to similar time\ntrajectories. This kind of visualization can help biologists analyze their\ndata and possibly \ufb01nd interesting time periods in which bone growth is changing. Lastly, note that\nthe Tibia bones (on the right) exhibit an interesting change in the midst of its growth. This particular\ntime was also predicted by other means by the biologists.\n\n6 Conclusion\n\nIn this work, we analyze and generalize the idea of concave relaxations for graph matching prob-\nlems. We concentrate on conditionally concave and probably conditionally concave energies and\ndemonstrate that they provide useful relaxations in practice. We prove that all local minima of such\nrelaxations are with high probability in the original feasible set; this allows removing the standard\npost-process projection step in relaxation-based algorithms. Another conclusion is that the set of\noptimal solutions of such relaxations coincides with the set of optimal solutions of the original graph\nmatching problem.\nThere are popular edge af\ufb01nity matrices, such as {0, 1} adjacency matrices, that in general do not\nlead to conditionally concave relaxations. This raises the general question of characterizing more\ngeneral classes of af\ufb01nity matrices that furnish (probably) conditionally-concave relaxations. Another\ninteresting future work could try to obtain information on the quality of local minima for more\nspeci\ufb01c classes of graphs.\n\n7 Acknowledgments\n\nThe authors would like to thank Boaz Nadler, Omri Sarig, Vova Kim and Uri Bader for their\nhelpful remarks and suggestions. This research was supported in part by the European Research\nCouncil (ERC Consolidator Grant, \"LiftMatch\" 771136) and the Israel Science Foundation (Grant\nNo. 1830/17). The authors would also like to thank Tomer Stern and Eli Zelzer for the bone scans.\n\n9\n\n\fReferences\nA\ufb02alo, Y., Bronstein, A., and Kimmel, R. (2015). On convex relaxation of graph isomorphism.\nProceedings of the National Academy of Sciences of the United States of America, 112(10):2942\u20137.\n\nAlmohamad, H. and Duffuaa, S. O. (1993). A linear programming approach for the weighted graph\nmatching problem. IEEE Transactions on pattern analysis and machine intelligence, 15(5):522\u2013\n525.\n\nBerg, A. C., Berg, T. L., and Malik, J. (2005). Shape matching and object recognition using low\ndistortion correspondences. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE\nComputer Society Conference on, volume 1, pages 26\u201333. IEEE.\n\nBernard, F., Theobalt, C., and Moeller, M. (2017). Tighter lifting-free convex relaxations for quadratic\n\nmatching problems. arXiv preprint arXiv:1711.10733.\n\nBernard, F., Vlassis, N., Gemmar, P., Husch, A., Thunberg, J., Gon\u00e7alves, J. M., and Hertel, F. (2016).\nFast correspondences for statistical shape models of brain structures. In Medical Imaging: Image\nProcessing, page 97840R.\n\nBogomolny, E., Bohigas, O., and Schmit, C. (2007). Distance matrices and isometric embeddings.\n\narXiv preprint arXiv:0710.2063.\n\nBoyarski, A., Bronstein, A., Bronstein, M., Cremers, D., Kimmel, R., L\u00e4hner, Z., Litany, O., Remez,\nT., Rodola, E., Slossberg, R., et al. (2017). Ef\ufb01cient deformable shape correspondence via kernel\nmatching. arXiv preprint arXiv:1707.08991.\n\nBurkard, R. E., Dragoti-Cela, E., Pardalos, P., and Pitsoulis, L. (1998). The quadratic assignment\n\nproblem. In Handbook of Combinatorial Optimization. Kluwer Academic Publishers.\n\nCoifman, R. R. and Lafon, S. (2006). Diffusion maps. Applied and computational harmonic analysis,\n\n21(1):5\u201330.\n\nCour, T., Srinivasan, P., and Shi, J. (2007). Balanced graph matching.\n\nInformation Processing Systems, pages 313\u2013320.\n\nIn Advances in Neural\n\nDym, N., Maron, H., and Lipman, Y. (2017). Ds++: a \ufb02exible, scalable and provably tight relaxation\n\nfor matching problems. ACM Transactions on Graphics (TOG), 36(6):184.\n\nFiori, M. and Sapiro, G. (2015). On spectral properties for graph matching and graph isomorphism\n\nproblems. Information and Inference: A Journal of the IMA, 4(1):63\u201376.\n\nFischler, M. A. and Bolles, R. C. (1987). Random sample consensus: a paradigm for model \ufb01tting\nwith applications to image analysis and automated cartography. In Readings in computer vision,\npages 726\u2013740. Elsevier.\n\nFrank, M. and Wolfe, P. (1956). An algorithm for quadratic programming. Naval Research Logistics\n\n(NRL), 3(1-2):95\u2013110.\n\nFunkhouser, T. and Shilane, P. (2006). Partial matching of 3 d shapes with priority-driven search. In\n\nACM International Conference Proceeding Series, volume 256, pages 131\u2013142.\n\nGiorgi, D., Biasotti, S., and Paraboschi, L. (2007). Shape retrieval contest 2007: Watertight models\n\ntrack. SHREC competition, 8(7).\n\nGuo, Y., Wu, G., Jiang, J., and Shen, D. (2013). Robust anatomical correspondence detection by\n\nhierarchical sparse graph matching. IEEE transactions on medical imaging, 32(2):268\u2013277.\n\nHuet, B., Cross, A. D., and Hancock, E. R. (1999). Graph matching for shape retrieval. In Advances\n\nin Neural Information Processing Systems, pages 896\u2013902.\n\nImhof, J.-P. (1961). Computing the distribution of quadratic forms in normal variables. Biometrika,\n\n48(3/4):419\u2013426.\n\nKezurer, I., Kovalsky, S. Z., Basri, R., and Lipman, Y. (2015). Tight relaxation of quadratic matching.\n\nIn Computer Graphics Forum, volume 34, pages 115\u2013128. Wiley Online Library.\n\n10\n\n\fKim, V. G., Lipman, Y., and Funkhouser, T. (2011). Blended intrinsic maps. In ACM Transactions on\n\nGraphics (TOG), volume 30, page 79. ACM.\n\nKruskal, J. B. and Wish, M. (1978). Multidimensional scaling, volume 11. Sage.\n\nLoiola, E. M., de Abreu, N. M. M., Boaventura-Netto, P. O., Hahn, P., and Querido, T. (2007).\nA survey for the quadratic assignment problem. European journal of operational research,\n176(2):657\u2013690.\n\nLyzinski, V., Fishkind, D. E., Fiori, M., Vogelstein, J. T., Priebe, C. E., and Sapiro, G. (2016).\nGraph Matching: Relax at Your Own Risk. IEEE Transactions on Pattern Analysis and Machine\nIntelligence.\n\nRodola, E., Torsello, A., Harada, T., Kuniyoshi, Y., and Cremers, D. (2013). Elastic net constraints\nfor shape matching. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages\n1169\u20131176. IEEE.\n\nRudelson, M., Vershynin, R., et al. (2013). Hanson-wright inequality and sub-gaussian concentration.\n\nElectronic Communications in Probability, 18.\n\nSchroff, F., Kalenichenko, D., and Philbin, J. (2015). Facenet: A uni\ufb01ed embedding for face\nrecognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern\nrecognition, pages 815\u2013823.\n\nUmeyama, S. (1988). An eigendecomposition approach to weighted graph matching problems. IEEE\n\ntransactions on pattern analysis and machine intelligence, 10(5):695\u2013703.\n\nVestner, M., Litman, R., Rodola, E., Bronstein, A., and Cremers, D. (2017). Product manifold \ufb01lter:\nNon-rigid shape correspondence via kernel density estimation in the product space. In 2017 IEEE\nConference on Computer Vision and Pattern Recognition (CVPR), pages 6681\u20136690. IEEE.\n\nVogelstein, J. T., Conroy, J. M., Lyzinski, V., Podrazik, L. J., Kratzer, S. G., Harley, E. T., Fishkind,\nD. E., Vogelstein, R. J., and Priebe, C. E. (2015). Fast approximate quadratic programming for\ngraph matching. PLOS one, 10(4):e0121002.\n\nWendland, H. (2004). Scattered data approximation, volume 17. Cambridge university press.\n\nWu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J. (2015). 3d shapenets: A deep\nrepresentation for volumetric shapes. In Proceedings of the IEEE conference on computer vision\nand pattern recognition, pages 1912\u20131920.\n\nZaslavskiy, M., Bach, F., and Vert, J.-P. (2009). A path following algorithm for the graph matching\nproblem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12):2227\u20132242.\n\nZhou, F. and De la Torre, F. (2012). Factorized graph matching. In Computer Vision and Pattern\n\nRecognition (CVPR), 2012 IEEE Conference on, pages 127\u2013134. IEEE.\n\n11\n\n\f", "award": [], "sourceid": 267, "authors": [{"given_name": "Haggai", "family_name": "Maron", "institution": "Weizmann Institute of Science"}, {"given_name": "Yaron", "family_name": "Lipman", "institution": "Weizmann Institute of Science"}]}