{"title": "A Condition Number for Joint Optimization of Cycle-Consistent Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 1007, "page_last": 1017, "abstract": "A recent trend in optimizing maps such as dense correspondences between objects or neural networks between pairs of domains is to optimize them jointly. In this context, there is a natural \\textsl{cycle-consistency} constraint, which regularizes composite maps associated with cycles, i.e., they are forced to be identity maps. However, as there is an exponential number of cycles in a graph, how to sample a subset of cycles becomes critical for efficient and effective enforcement of the cycle-consistency constraint. This paper presents an algorithm that select a subset of weighted cycles to minimize a condition number of the induced joint optimization problem. Experimental results on benchmark datasets justify the effectiveness of our approach for optimizing dense correspondences between 3D shapes and neural networks for predicting dense image flows.", "full_text": "A Condition Number for Joint Optimization of\n\nCycle-Consistent Networks\n\nLeonidas Guibas1, Qixing Huang2, and Zhenxiao Liang2\n\n1Stanford University\n\n2The University of Texas at Austin\n\nAbstract\n\nA recent trend in optimizing maps such as dense correspondences between objects\nor neural networks between pairs of domains is to optimize them jointly. In this\ncontext, there is a natural cycle-consistency constraint, which regularizes composite\nmaps associated with cycles, i.e., they are forced to be identity maps. However,\nas there is an exponential number of cycles in a graph, how to sample a subset\nof cycles becomes critical for ef\ufb01cient and effective enforcement of the cycle-\nconsistency constraint. This paper presents an algorithm that select a subset of\nweighted cycles to minimize a condition number of the induced joint optimization\nproblem. Experimental results on benchmark datasets justify the effectiveness of\nour approach for optimizing dense correspondences between 3D shapes and neural\nnetworks for predicting dense image \ufb02ows.\n\n1\n\nIntroduction\n\nMaps between sets are important mathematical quantities. Depending on the de\ufb01nition of sets, maps\ncan take different forms. Examples include dense correspondences between image pixels [28, 23],\nsparse correspondences between feature points [27], vertex correspondences between social or\nbiological networks [13], and rigid transformations between archaeological pieces [16]. In the\ndeep learning era, the concept of maps naturally extends to neural networks between different\ndomains. A fundamental challenge in map computation is that there is only limited information\nbetween pairs of objects/domains for map computation, and the resulting maps are often noisy and\nincorrect, particularly between relevant but dissimilar objects. A recent trend in map computation\nseeks to address this issue by performing map synchronization, which jointly optimizes maps among\na collection of related objects/domains to improve the maps between pairs of objects/domains in\nisolation [29, 24, 15, 14, 39, 9, 17, 11, 19, 45, 34]. In this context, there is a natural constraint called\ncycle-consistency, which states that composite maps along cycles should be the identity map. When\nmaps admit matrix representations, state-of-the-art techniques [14, 41, 9, 41, 36, 36, 5, 4] formulate\nmap synchronization as recovering a low-rank matrix, where the pairwise maps computed between\npairs of objects are noisy measurements of the blocks of this matrix. This paradigm enjoys tight exact\nrecovery conditions as well as empirical success (e.g.,[14, 41, 9, 41, 36, 36, 5, 4]).\nIn this paper, we focus on the case where maps between objects/domains do not admit matrix\nrepresentations (c.f. [45]), which is a popular setting for neural networks between domains. To jointly\noptimize neural networks in this context, one has to enforce the original cycle-consistency constraint.\nThe technical challenge, though, is that the number of cycles in a graph may be exponential in the\nnumber of vertices. In other words, we have to develop a strategy to effectively sample a subset\nof cycles to enforce the cycle-consistency constraint. The goal for cycle selection is three-fold:\ncompleteness, conciseness, and stability. Completeness stands for the fact that enforcing the cycle-\nconsistency constraint on the selected cycles induces the cycle-consistency property among all cycles\nin the graph. Conciseness means both the size of the cycle-set and the length of each cycle should\nbe small. Stability concerns the convergence behavior when solving the induce joint optimization\nproblem, e.g., joint learning of a network of neural networks. In particular, stability is crucial for\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fcycle-set selection as many cycle sets satisfy the \ufb01rst two objectives, yet the numerical behavior of the\ninduced optimization problem turns out drastically different. To date, cycle selection is mostly done\nmanually or uses approaches that only consider the \ufb01rst two objectives. In contrast, we introduce an\nautomatic approach that takes all three objectives into account for cycle selection.\nOur cycle selection approach \ufb01rst establishes a stability score based on a condition number of the\nHessian matrix of the induced optimization problem. This condition number dictates the local\nconvergence rate as well as the convergence radius of the induced optimization problem. Our\napproach then combines semide\ufb01nite programming and importance sampling to select a small subset\nof cycles from an over-complete set of cycles to minimize the condition number.\nWe have evaluated our approach on a variety of settings of optimizing cycle-consistent maps, including\ndense correspondences across hundreds of shapes and neural networks for predicting dense \ufb02ows\nacross natural images. Example results show that cycle selection not only improves the convergence\nrate of the induced optimization problem but also leads to maps with improved map quality.\n\n2 Related Works\n\nOptimizing maps among a collection of objects/domains is a fundamental problem across many\ndifferent scienti\ufb01c domains. In the following, we review works that focus on various optimization\ntechniques, which are most relevant to the context of this paper.\nCycle-consistency constraint. The foundation for joint map computation is the so-called cycle-\nconsistency constraint [20, 14], which states that the composition of correct maps along cycles should\nbe the identity map. There are two widely used formulations of the cycle-consistency constraint.\nLow-rank based techniques utilize matrix representations of maps and the equivalence between cycle-\nconsistent maps and the fact that the matrix that stores pair-wise maps in blocks is low-rank and/or\nsemide\ufb01nite (c.f.[14]). This equivalence leads to a simple formulation of joint map computation via\nlow-rank matrix recovery [41, 24, 15, 14, 17, 10, 48, 36, 26, 18, 3, 33, 32, 1, 6, 2]. Such techniques\nenjoy both empirical success and exact recovery conditions. However, a fundamental limitation of\nsuch techniques is that there must exist matrix map representations, and such assumptions are not\nalways true, e.g., when neural networks encode maps.\nAnother category of methods utilizes spanning trees [20, 16]. In a modern context of joint map\ncomputation, i.e., recovering accurate maps from maps computed between pairs of objects in isolation,\none can seek to recover the correct maps by computing the minimum spanning tree where the induced\nmaps agree with the input maps as much as possible. Most recent combinatorial optimization tech-\nniques are based on sampling inconsistent cycles [44, 29, 47]. They formulate map synchronization\nas removing maps so that each inconsistent cycle contains at least one removed map. However,\nboth techniques are most suited for the task of pruning incorrect maps. They are not suitable for\noptimizing maps continuously. In contrast, the approach described in this paper combines the strength\nof both formulations. Precisely, we still formulate map synchronization by minimizing an objective\nfunction that combines an observation term and a prior term. The observation term evaluates the\nquality of each map based on the training data. The difference is in the regularization term, where\nwe directly enforce the consistency of maps along cycles. This approach is suitable for diverse map\nrepresentations. However, a fundamental challenge is to obtain a concise and effective cycle-basis,\nwhich is the main focus of this paper.\nCycle-basis. Cycle-basis is a well studied topic on undirected graphs (c.f.[22]). In the standard\nsetting, a cycle-basis consists of a minimum set of cycles of a graph, where all other cycles in this\ngraph are linear combinations of the cycles in this graph. In this paper, we extend this notion to cycle-\nconsistency bases. The goal is to compute a minimum set of cycles, where enforcing consistency\nalong these cycles induces consistency along all cycles of the input graph. Although cycle-consistency\nbases are equivalent, enforcing them for map computation exhibits different behavior. The primary\ngoal of this paper is to properly de\ufb01ne the condition number of cycle-consistency basis and develop\nef\ufb01cient ways to optimize in the space of cycle-consistency bases to minimize the condition number.\n\n3 Cycle-Consistency Bases\n\nIn this section, we de\ufb01ne map graphs and cycle-consistency bases. In Section 4, we discuss how to\noptimize cycle-consistency bases for joint map optimization. Note that due to the space constraint,\nwe defer the proofs to the supplemental material.\n\n2\n\n\fWe \ufb01rst de\ufb01ne the notion of a map graph along an undirected graph G = (V,E). Since parametric\nmaps (e.g., neural networks) are oriented, we let the edges in E be oriented. We say G is undirected if\nand only if \u2200(i, j) \u2208 E, (j, i) \u2208 E.\nDe\ufb01nition 1 We de\ufb01ne a map graph F as an attributed undirected graph G = (V,E) where V =\n{v1, . . . , v|V|}. Each vertex vi \u2208 V is associated with a domain Di. Each edge e = (i, j) \u2208 E is\nassociated with a map fij : Di \u2192 Dj. In this paper, we assume G is connected. Note that when\nde\ufb01ning cycle-consistency bases later, we always assume fij is an isomorphism between Di and Dj,\nand fji = f\u22121\nij .\nTo de\ufb01ne cycle-consistency bases on G, we introduce composite maps along cycles of G.\nDe\ufb01nition 2 Consider a cycle c = (i1,\u00b7\u00b7\u00b7 , iki1) along G. We de\ufb01ne the composite map along c\ninduced from a map graph F as\n(1)\n\nfc = fiki1 \u25e6 \u00b7\u00b7\u00b7 \u25e6 fi1i2 .\n\nIn this paper, we are talking about a cycle. We always assume it has no repeating edges.\nDe\ufb01nition 3 Given a map graph F associated with a graph G, let C be a cycle set of G. We say F is\ncycle-consistent on C, if\n(2)\nHere IdX refers to the identity mapping from X to itself. Let C collects all cycles of G. We say F is\ncycle-consistent, if it is cycle-consistent on C.\nRemark 1 Note that due to the bi-directional consistency in Def. 1, (2) is independent of the starting\nvertex i1. In fact, it induces\n\n\u2200c = (i1 \u00b7\u00b7\u00b7 iki1) \u2208 C.\n\nfc = IdDi1\n\n,\n\nfil\u00b7\u00b7\u00b7iki1\u00b7\u00b7\u00b7il = IdDil\n\n,\n\n1 \u2264 l \u2264 k.\n\nSince C contains an exponential number of cycles, a natural question is whether we can choose a\nsmall subset of cycles C so that for every map graph F that is cycle-consistent on C, it induces the\ncycle-consistency of F. To this end, we need to de\ufb01ne the notion of induction:\nDe\ufb01nition 4 Consider a cycle set C and a cycle c /\u2208 C. We say C induces c if there exists an ordered\ncycle set c1,\u00b7\u00b7\u00b7 , cK \u2208 C and intermediate simple cycles c(k), 1 \u2264 k \u2264 K so that (1) c(1) = c1, (2)\nc(K) = c, and (3) c(k) = c(k\u22121) \u2295 ck, i.e., c(k) is generated by adding new edges in ck to c(k\u22121)\nwhile removing their common edges.\n\nAn immediate consequence of Def. 4 is the following:\nFact 1 Given a map graph F, a cycle set C and another cycle c. If (1) F is cycle-consistent on C,\nand (2) C induces c. Then F is cycle-consistent on {c}.\nRemark 2 It is necessary for Def. 4 to require that c(k), 1 \u2264 k \u2264 K are simple cycles. We provide a\ncounter example in the supplemental material.\n\nThe following proposition shows that Def. 4 is complete.\nProposition 1 Suppose a map graph F is cycle-consistent on a cycle set C. If a cycle c can not be\ninduced from C using the procedure described in Def. 4, then F may not be cycle-consistent on {c}.\nNow we de\ufb01ne the notion of cycle-consistency basis:\nDe\ufb01nition 5 We say a cycle set C is a cycle-consistency basis if it induces all other cycles of C.\nThe following proposition characterizes the minimum size of cycle-consistency bases and a procedure\nfor constructing a category of cycle-consistency bases with minimum size.\nProposition 2 The minimum size of a cycle-consistency basis on a connected graph G is |E|\u2212|V| + 1.\nMoreover, we can construct a minimal cycle-consistency basis from a spanning tree T \u2282 E of G, i.e.,\nby creating a cycle ce for each edge pair e = (i, j) \u2208 E \\ T , where ce = (i, j) \u223c pji, where pji is\nthe unique path from j to i on T .\n\n3\n\n\fRemark 3 A difference between cycle-consistency bases on undirected graphs and path-invariance\nbases on directed graphs is that the minimum size of cycle-consistency bases is known and is upper\nbounded by the number of edges. In contrast, computing the minimum size of path-invariance bases\nof a given directed graph remains an open problem (c.f.[45]).\n\nConnections to cycle bases [22]. When talking about cycles of an undirected graph, there exists a\nrelated notion of cycle bases [22]. The difference between cycle-consistency bases and cycle bases\nlies in the induction procedure. Speci\ufb01cally, cycle bases utilize a vector that collects edge indicators,\ni.e., each edge has an orientation, and each cycle corresponds to a sparse vector whose elements are\nin {1, 0}, which represent edges in c and the other edges. The induction procedure takes the form of\nlinear combinations of indicator vectors. Depending on the weights for linear combination, cycle\nbases fall into zero-one cycle bases, integral cycle bases and general cycle-bases, which correspond\nto {\u22121, 0, 1}, integer and real weights, respectively. Please refer to [22] for more details.\nIt is easy to see that one can use linear combinations of vectors with binary weights to encode the\ninduction procedure of cycle-consistency bases. On the other hand, the reverse is not true.\nProposition 3 A cycle-consistency basis is a zero-one cycle basis. A zero-one cycle basis may not be\na cycle-consistency basis.\n\nRemark 4 Unlike the situation that we can verify a zero-one cycle basis by checking independence\nwithin polynomial time (c.f. [22]), we conjecture that verifying whether a cycle set forms a cycle-\nconsistency basis is NP-hard. In light of this, when optimizing a cycle set we enforce its cycle-\nconsistency property by adding cycles to a minimum cycle-consistency basis.\n\n4 Cycle Consistency Basis Optimization for Joint Map Optimization\n\nIn this section, we present an approach that optimizes a cycle-consistency basis for a given joint map\noptimization task. Speci\ufb01cally, we assume each edge (i, j) \u2208 E is associated with a parametric map\n: Di \u2192 Dj, where \u03b8ij denotes the parameters of this map. We also assume we have a subset\nf \u03b8ij\nij\nof edges E0 \u2286 E. For each edge e \u2208 E0, we have a loss term denoted as lij(\u03b8ij) (e.g., we may only\nhave training data among a subset of edges). We assume G0 = (V,E0) forms a connected graph.\nOtherwise, we have insuf\ufb01cient constraints to determine all parametric maps.\nOur main idea is to pre-compute a super set of cycles Csup (See Section 4.2 for details) and formulate\ncycle-consistency basis optimization as determining a cycle set C, where Cmin \u2286 C \u2286 Csup, and a\nweight wc > 0 of each cycle c \u2208 C, for solving the following joint map optimization problem:\n\n(cid:88)\n\n(cid:88)\n\nlij(\u03b8ij) +\n\n(i,j)\u2208E0\n\nc=(i1\u00b7\u00b7\u00b7iki1)\u2208C\n\nwcli1(f \u0398\n\nc , IdDi1\n\n)\n\n(3)\n\n\u03b8ik i1\niki1\n\nc = f\n\n\u25e6 \u00b7\u00b7\u00b7 \u25e6 f \u03b8\n\ni1i2 is the composite map along c and li(\u00b7,\u00b7) denotes a loss-term for\nHere f \u0398\ncomparing self-maps on Di. For example, li(f, f(cid:48)) := Ex\u223cpd2Di\n(f (x), f(cid:48)(x)), where dDi(\u00b7,\u00b7) is\na distance metric of Di, and where p is an empirical distribution on Di. Note that (3) essentially\nenforces the cycle-consistency constraint along C.\nIn the reminder of this section, Section 4.1 introduces a condition number of (3); Section 4.2 describes\nhow to minimize this condition number by selecting and weighting cycles.\n\n4.1 A Condition Number for Cycle-Consistency Map Optimization\n\nWe begin with a simple setting of translation synchronization [18], where pairwise parametric maps\nare given by translations. We then discuss how to generalize this de\ufb01nition to neural networks. Note\nthat for the particular task of translation synchronization, there exist many other formulations, e.g.,\n[21, 18]. Our goal here is simply to motivate the de\ufb01nition of the condition number. Speci\ufb01cally,\nij \u2208 R for each edge e = (i, j) \u2208 E0. Our goal is to recover\nconsider a pre-computed translation t0\ntranslations tij, (i, j) \u2208 E by solving the following quadratic minimization problem:\ntilii+1)2.\n\n(tij \u2212 t0\n\n(cid:88)\n\n(cid:88)\n\n(cid:88)\n\nij)2 +\n\nwc(\n\nfor edge e. With vc =(cid:80)k\n\nwhere we set lk+1 := l1. Given an ordering of the edge set E, let ve \u2208 {0, 1}|E| be the indicator vector\nl=1 v(il,il+1) we denote the indicator vector for cycle c = (i1 \u00b7\u00b7\u00b7 iki1). Let\n\nc=(i1\u00b7\u00b7\u00b7iki1)\u2208C\n\n(i,j)\u2208E0\n\nmin\n\n{tij ,(i,j)\u2208E}\n\n(4)\n\n4\n\n\ft0 =(cid:80)\n\nij and t =(cid:80)\n\n(i,j)\u2208E 0 v(i,j)t0\n\n(i,j)\u2208E v(i,j)tij. We can rewrite (4) in the matrix form as\n\ntT Ht \u2212 2tT t0 + (cid:107)t0(cid:107)2,\n\nH :=\n\nmin\n\nt\n\nvevT\n\ne +\n\nwcvcvT\nc\n\n(5)\n\n(cid:88)\n\ne\u2208E 0\n\n(cid:88)\n\nc\u2208C\n\nWhen solving (5) using gradient-based techniques (which is the case for neural networks), their\nconvergence rates are generally relevant to the condition number \u03ba(H) := \u03bbmax(H)/\u03bbmin(H). For\nexample, steepest descent with exact line search admits a linear convergence rate of (\u03ba(H) \u2212 1)/\n(\u03ba(H) + 1), and this argument also applies to the local convergence behavior of non-linear objective\nfunctions (c.f. Sec.1.3 of [7]). In addition, the deviation between the optimal solution t(cid:63) and the\nground truth solution tgt is (cid:107)t(cid:63) \u2212 tgt(cid:107) = (cid:107)H\u22121e(cid:107) \u2264\n\u03bbmin(H)(cid:107)e(cid:107), where e = t0 \u2212 tgt\n0 is the vector\nthat encodes the error in the input. Minimizing the \u03ba(H) usually leads to an increased value of \u03bbmin,\nwhich reduces the error in the output. Thus, we proceed with the following de\ufb01nition:\nDe\ufb01nition 6 We de\ufb01ne the condition number of (3) as the condition number \u03ba(H).\n\n1\n\nDef. 6 also generalizes to other parametric maps:\nTheorem 4.1 (Informal) Under mild conditions, the condition number of the Hessian matrix at a\nlocal optimal of (3) is O(s\u03ba(H)), where s depends on quantities of individual f \u03b8ij\nij .\nAnother factor related to an ef\ufb01cient optimization of (3) is the size of C. As there is some \ufb01xed\noverhead for implementing the constraint associated with each cycle, we favor that the size of C\nis small. In summary, our goal for cycle-consistency basis optimization is to reduce the condition\nnumber of H while simultaneously to minimize the size of the resulting cycle-consistency basis.\n\n4.2 Algorithm for Cycle-Consistency Basis Construction\n\nOur approach for cycle-consistency basis optimization proceeds in three steps. The \ufb01rst step generates\nthe super cycle set Csup. The second step solves a semi-de\ufb01nite program to optimize weights\nwc, c \u2208 Csup to minimize the condition number of H. The \ufb01nal setup controls the size of the resulting\ncycle-consistency basis via importance sampling.\nCsup generation. We construct Csup by computing the breadth-\ufb01rst spanning tree T (vi) rooted at\neach vertex vi \u2208 V. For each spanning-tree T (vi), we use the procedure described in Prop. 2 to\nconstruct a minimum cycle-consistency basis C(vi). We de\ufb01ne Csup := \u222avi\u2208VC(vi). We set Cmin as\nthe cycle-consistency basis with minimum depth.\nThe resulting Csup has two desired properties. First, the cycles in Csup are kept as short as possible.\nFor example, if G is a clique, then Csup only contains the desired 3-cycles. Second, if G is sparse,\nthen Csup contains a mixture of short and long cycles. These long cycles can address the issue of\naccumulated errors if we only enforce the cycle-consistency constraint along short cycles.\nWeight optimization. As the condition number of H is minimized if it is a scalar multiple of the\nidentity matrix, we formulate the following semide\ufb01nite program for optimizing cycle weights:\n\nmin\n\nwc\u22650,s1,s2\n\n\u03b1s2 \u2212 s1\n\nsubject to (C1) : s1I (cid:22) (cid:88)\n\nvevT\n\ne +\n\ne\u2208E 0\n|vc|2wc = \u03bb,\n\n(cid:88)\n\nc\u2208Csup\n\n(C2) :\n\n(cid:88)\n\nc\u2208Csup\n\nwcvcvT\n\nc (cid:22) s2I\n\n(C3) : wc \u2265 \u03b4,\u2200c \u2208 Cmin\n\n(6)\n\nHere \u03bb characterizes the trade-off between the loss terms and the regularization terms and \u03b1 is a super\nparameter. To \ufb01gure out the motivation of introducing such a parameter, recall that the de\ufb01nition of\ncondition number is \u03bbmax\nwhere \u03bbmax, \u03bbmin are maximal and minimal eigenvalues of the weighted\n\u03bbmin\nmatrix sum respectively. But optimizing the ratio of two eigenvalues is non-convex in terms of\nvariables {wc}, thus we in turn try to reduce the gap between \u03bbmax and \u03bbmin. Different \u03b1\u2019s would\nprovide different outcomes of eigen-ratio. Hence we test over a set of \u03b1\u2019s and look for the one with\nthe objective function \u03b1\u2217s2 \u2212 s1\noptimal eigen-ratio. In fact it is easy to see that as \u03b1\u2217 = max \u03bbmin\nwould have minimum 0 at the point which minimizes \u03bbmax\nas well. In most cases where the optimal\n\u03bbmin\nratio is in the order of magnitude near 1, so that the setting \u03b1 = 1 would provide a good enough\napproximation.\n\n\u03bbmax\n\n5\n\n\fMoreover, (C3) ensures that edges in the minimum cycle consistency basis Cmin are also selected. In\nour experiments, we set \u03bb = 8|E0|/|E| and \u03b4 = \u03bb/(8|E|). We solve (6) using alternating direction\nmethod of multipliers (See the supplemental material for details)\nAs the cycle indicators, which are sparse vectors, tend to be orthogonal to each other. The solution to\n(6) typically turns a matrix with small condition number.\n\n1 and s(cid:63)\n\n2 \u2212 s(cid:63)\n\n2 be the optimal solution to (6). Then maxc wc \u2264 s(cid:63)\n\n2, and both\n1 are upper bounded by quantities that are related to an uniform score of the spherical\n\nTheorem 4.2 (Informal) Let s(cid:63)\n2 and s(cid:63)\ns(cid:63)\nVoronoi diagram of copies of {ve, e \u2208 E0} \u222a { vc(cid:107)vc(cid:107) , c \u2208 Csup}.\nImportance sampling. Although the semide\ufb01nite program described in (6) controls the condition\nnumber of H, it does not control the size of the cycle sets with positive weights. Thus we seek to\nselect a subset of cycles Csample \u2282 Cactive := Csup \\ Cmin and compute new weights wc, c \u2208 Csample, so\nthat\n(7)\n\nc \u2248 (cid:88)\n\nWe achieve this goal through sampling. Speci\ufb01cally, consider a desired size L for Csample. Let\n(wc/wmax)\u03b1. We de\ufb01ne an\nwmax = max\nc\u2208Cactive\nindependent random variable xc and a modi\ufb01ed weight wc for each cycle c \u2208 Cactive:\n\nwc. Choose the maximum \u03b1 \u2264 1 so that L \u2264 (cid:80)\n(cid:26) 1\n\n(cid:88)\n\n(cid:88)\n\nwcvcvT\nc .\n\nwcvcvT\n\nc\u2208Csample\n\nc\u2208Cactive\n\nc\u2208Cactive\n\nwc = wc/pc,\n\npc := L \u00b7 w\u03b1\nc /\n\nxc =\n\nwith probability pc\n0 with probability 1 \u2212 pc\n\nc\u2208Cactive\n\nTo generate Csample, we simply sample Cactive according xc. It is easy to check that\n\nE[|Csample|] = L,\n\nwcvcvT\n\nc ] =\n\nwcvcvT\nc .\n\n(cid:88)\n\nE[\n\nc\u2208Csample\n\n(cid:88)\n\nc\u2208Cactive\n\nw\u03b1\nc .\n\n(8)\n\n(9)\n(10)\n\nIn the following, we provide concentration inequalities on both quantities:\n\nTheorem 4.3 Given the sampling procedure described in (8) with standard deviation\n\n\u03c31 :=\n\nand condition\n\nwe have with probability at least 1 \u2212 O(1/poly(n)),\n\n(cid:115)(cid:88)\n\npc(1 \u2212 pc),\n\nc\n\n\u03c32\n1 = \u2126(1),\n\n(cid:115)(cid:88)\nwc)2(cid:1),\n2 = \u2126(cid:0)(max\n\n\u03c32 :=\n\nc\n\nc\n\n\u03c32\n\npc(1 \u2212 pc)w2\n\nc\n\n(cid:107) (cid:88)\n\nc\u2208Csample\n\nc \u2212 (cid:88)\n\nc\u2208Cactive\n\nwcvcvT\n\n||Csample| \u2212 L| \u2264 O(log n)\u03c31\nc (cid:107) \u2264 O(log n)\u03c32\n\nwcvcvT\n\nNote that (10), which utilizes rank(vcvT\nc ) = 1, is sharper than general concentration bounds [38].\nTo be more precise, the known bound contains an extra multiplicative term O(log d) where d is the\ndimension of matrices involved. In our case d = O(|E|).\n\n5 Experimental Evaluation\n\nIn this section, we present an experimental evaluation of our joint map optimization approach in two\napplication settings: consistent shape maps (c.f. [24, 14, 17, 11]) and dense image correspondence\nusing neural networks (c.f. [46, 45]).\n\n5.1 Consistent Shape Correspondences\nExperimental setup. Similar to [39, 40, 17], we encode the map from one shape Si and another\nshape Sj as a function map Xij : F(Si) \u2192 F(Sj) [31]. The same as [17], we choose each functional\nspace F(Si) as the linear space spanned by smallest m = 30 eigenvectors of the co-tangent mesh\n\n6\n\n\f(cid:88)\n\n(i,j)\u2208E\n\n(cid:88)\n\nc=(i1\u00b7\u00b7\u00b7i|c|i1)\u2208C\n\n(a)\n\n(b)\n\n(c)\n\nFigure 1: The top and bottom rows show quantitative evaluations on the ShapeCoSeg [42] and the PAS-\nCAL3D [43], respectively. (a) Cumulative distributions of geodesic errors (ShapeCoSeg) and Euclidean errors\n(PASCAL3D) of predicted feature correspondences of our approach and baseline approaches. (b) Distributions\nof cycle weights. (c) Distributions of cycle weights per cycle-length\nLaplacian. A functional map Xij \u2208 Rm\u00d7m essentially encodes a linear map from F(Si) to F(Sj).\nWe refer to [31] for more details about functional maps.\nThe evaluation considers two shape collections from ShapeCoSeg [42]: Alien (200 shapes) and Vase\n(300 shapes). For each shape collection, we construct G by connecting every shape with k = 25\nrandomly chosen shapes. The edge set E0 collects for each shape the k0 = 9 closest shapes among\nthe neighbors speci\ufb01ed by E via the GPS descriptor [35]. For each edge e = (i, j) \u2208 E0, we compute\ndense correspondences from Si to Sj using Blended Intrinsic Maps [25]. We then convert this map\ninto the corresponding functional map X 0\nFor joint map computation, we obtain improved functional maps Xij, (i, j) \u2208 E by minimizing\n\nij using [31].\n\n1\n|E|\n\n(cid:107)Xij \u2212 X in\n\nij (cid:107)2F + \u03bb\n\nwc(cid:107)Xi|c|i1 \u00b7\u00b7\u00b7 Xi1i2 \u2212 Im(cid:107)2F\n\n(11)\n\nFor numerical optimization, we start from the identity map Xij = Im, (i, j) \u2208 E and apply steepest\ndescent with exact line search [30]. We run 3000 iteration on each dataset. After optimization, we\ngenerate maps between all pairs of shapes by composing maps along shortest paths on G.\nAnalysis of results. To evaluate the quality of shape maps, we report the cumulative distribution (or\nCD) of normalized geodesic error egeo of predicted feature correspondences (c.f [25]. We compare our\napproach with three state-of-the-art joint shape matching approaches: Huang14 [17], Cosmo17 [11]\nand Zhang19 [45]. As shown in Figure 1(a), our approach leads to noticeable performance gains\nfrom Huang14 and Cosmo17 that leverage low-rank relaxations of the cycle-consistency constraint\n(c.f. [14]). An explanation is that when the observations are sparse, these low-rank relaxations become\nloose. In contrast, enforcing the cycle-consistency constraint exactly offers strong regularization.\nOur approach also outperforms [45] (i.e., by 4.9% when egeo = 0.1), which employs a relevant\npath-invariance constraint by treating G as a directed graph. As we will discuss immediately, the\nimprovement comes from weighted cycles.\nAnalysis of cycle weighting. As show in Figure 1(a), weighting the cycles has a signi\ufb01cant impact on\nthe quality of the optimized maps. When solving (11) with equal weight \u03bb/|Csup|, the CD percentage\ndrops by 5.2% (when egeo = 0.15). Note that using more iterations does not close the gap, as there is\nstill a 2.4% difference even after 30000 iterations. This gap also justi\ufb01es the argument that reducing\nthe condition number alleviates the ampli\ufb01ed errors in the solution of (11) that are caused by X 0\nij.\nFigure 1(b) plots the distribution of cycle weights returned by the SDP formulation. We can see that\nthe SDP formulation leads to sparse and relatively uniform cycle weights, indicating the effectiveness\n\n7\n\n0.030.060.090.120.150.18Geodesic Error20406080100%CorrespondencesShapeCoSeg-Baseline-EvalOursNoWeightZhang19Cosmo17Huang1420406080100Percentile %0.20.40.60.81.0Weight ValueWeight Dist. (ShapeCoSeg)AlienVase0.20.40.60.81.0Percentile2345678Cycle LengthWeight vs Cycle-length (ShapeCoSeg)AlienVase0.030.060.090.120.150.18Euclidean Error15304560%CorrespondencesPASCAL3D-Baseline-EvalOursNoWeightIdenticalNetZhang19Zhou16Dosovitsky15Zhou1520406080100Percentile %0.20.40.60.81.0Weight ValueWeight Dist. (PASCAL3D)Grid0.20.40.60.81.0Percentile2345678910Cycle LengthWeight vs Cycle-length (PASCAL3D)Grid\fof our approach for selecting important cycles. Moreover, most cycles with positive weights are short\n(See Figure 1(c)). This behavior coincides with the intuition that G is a dense graph, and utilizing\nshort cycles for optimizing (11) is suf\ufb01cient.\n\n5.2 Consistent Neural Networks among Multiple Domains\n\nFigure 2: Map graph for image \ufb02ow.\n\nExperimental setup. In this setting, we consider the task of predicting dense \ufb02ows between image\nobjects using a neural network (c.f. [12, 46]. We perform experimental evaluation on 12 rigid\ncategories from PASCAL3D [43]. For each category, we construct a map graph G = (V,E), where\neach vertex v \u2208 V represents image objects viewed from similar camera poses, and where each edge\nrepresents a dense \ufb02ow neural network between some adjacent vertex pairs (to be discussed shortly).\nIn our experiments, we generate V by \ufb01rst picking the dom-\ninant view of each category [43] and then sampling a grid\nof 5 \u00d7 5 camera poses. This grid is centered at the domi-\nnant view, its two axes align with the latitude and longitude,\nand its spacing is 22.5\u25e6. Similar to ([46]), we consider both\nreal images from PASCAL3D [43] and synthetic images from\nShapeNet [8] for training. For each training image, we allo-\ncate it to four closest vertices in terms of camera poses. We\nconnect an edge between two vertices if the angular distance\nbetween their camera poses is less than 35\u25e6. All edges use the\nsame network architecture [46]. However, we allow them to\ntake different weights to learn speci\ufb01c features associated with\neach camera pose pair. Moreover, we set E = E0.\nWe apply (3) to jointly learn the neural networks associated\nwith each edge. Inspired by [12], we use synthetic images\nto de\ufb01ne the loss term associated with each edge. In contrast,\nthe cycle-consistency constraint is enforced on real images. To initialize the neural networks, we\n\ufb01rst pre-train a single network using synthetic images. We then pass the pre-trained weights for all\nnetworks. The same as joint shape matching, we generate dense image \ufb02ows between all pairs of\nimages by composing neural networks along shortest paths on G.\nDuring testing time, we use [37] to predict a camera pose for each image and associate it with\nthe closest vertex of G. Given two images, we extract the corresponding network to predict dense\ncorrespondences.\nAnalysis of results. For experimental evaluation, we report cumulative distributions of normalized\nEuclidean error eeuc (with respect to the max(width, height)) of predicted feature correspondences\n(c.f [47]. We compare our approach with four state-of-the-art data-driven dense image \ufb02ow ap-\nproaches: Zhou15 [47], Dosovitskiy15 [12],Zhou16 [46], Zhang19 [45]. As shown in Figure 1(a),\nour approach leads to noticeable performance gains from Zhou15, which only utilizes real images.\nLikewise, our approach also signi\ufb01cantly outperforms Dosovitskiy15 trained on synthetic data alone\n(i.e., by 8.1% when eeuc = 0.1. This encouraging result shows the potential of leveraging the\nself-supervision constraint on real images. In addition, our approach is also superior to [46] and [45].\nSuch improvements are attributed to allowing network parameters to vary across different edges \u2014\nenforcing identical network parameters results in a 3.2% drop in the CD percentage.\nAnalysis of cycle weighting. Similar to the case of joint shape matching, using uniform weights to\nsolve (3) leads to a 4.3% drop in the CD percentage, which again shows the advantages of weighing\nthe cycles for both the convergence behavior and the robustness of the solution. Moreover, the\ndistribution of cycle weights is similar to that of joint shape matching (See Figure 1(b)), where the\nsolution of SDP returns sparse and uniform cycle weights.\nA unique characteristic of this application is that with a relatively sparse graph, the selected cycles\ncontain a few long cycles (See Figure 1(c)). One explanation is that if all selected cycles are short, then\nthe composite networks along long cycles may suffer from accumulated errors. As a consequence,\nthe composite networks between non-adjacent vertices may drift.\nAcknowledgement. Qixing Huang would like to acknowledge support from NSF DMS-1700234, a\nGift from Snap Research, and a hardware Donation from NVIDIA. Leonidas Guibas would like to\nacknowledge NSF grant DMS-1546206, a grant from the Stanford-Toyota AI center, and a Vannevar\nBush Faculty Fellowship.\n\n8\n\n\fReferences\n[1] Federica Arrigoni, Andrea Fusiello, and Beatrice Rossi. Camera motion from group synchro-\nnization. In 3D Vision (3DV), 2016 Fourth International Conference on, pages 546\u2013555. IEEE,\n2016.\n\n[2] Federica Arrigoni, Luca Magri, Beatrice Rossi, Pasqualina Fragneto, and Andrea Fusiello.\nRobust absolute rotation estimation via low-rank and sparse matrix decomposition. In 3D Vision\n(3DV), 2014 2nd International Conference on, volume 1, pages 491\u2013498. IEEE, 2014.\n\n[3] Federica Arrigoni, Beatrice Rossi, and Andrea Fusiello. Spectral synchronization of multiple\n\nviews in se (3). SIAM Journal on Imaging Sciences, 9(4):1963\u20131990, 2016.\n\n[4] Chandrajit Bajaj, Tingran Gao, Zihang He, Qixing Huang, and Zhenxiao Liang. SMAC: Simul-\ntaneous mapping and clustering using spectral decompositions. In Jennifer Dy and Andreas\nKrause, editors, Proceedings of the 35th International Conference on Machine Learning, vol-\nume 80 of Proceedings of Machine Learning Research, pages 324\u2013333, Stockholmsm\u00e4ssan,\nStockholm Sweden, 10\u201315 Jul 2018. PMLR.\n\n[5] Afonso S. Bandeira, Nicolas Boumal, and Amit Singer. Tightness of the maximum likelihood\nsemide\ufb01nite relaxation for angular synchronization. Math. Program., 163(1-2):145\u2013167, May\n2017.\n\n[6] Florian Bernard, Johan Thunberg, Peter Gemmar, Frank Hertel, Andreas Husch, and Jorge\nGoncalves. A solution for multi-alignment by transformation synchronisation. In Proceedings\nof the IEEE Conference on Computer Vision and Pattern Recognition, pages 2161\u20132169, 2015.\n\n[7] D.P. Bertsekas. Nonlinear Programming. Athena Scienti\ufb01c, 1999.\n\n[8] Angel X. Chang, Thomas A. Funkhouser, Leonidas J. Guibas, Pat Hanrahan, Qi-Xing Huang,\nZimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and\nFisher Yu. Shapenet: An information-rich 3d model repository. CoRR, abs/1512.03012, 2015.\n\n[9] Yuxin Chen, Leonidas J. Guibas, and Qi-Xing Huang. Near-optimal joint object matching via\nconvex relaxation. In Proceedings of the 31th International Conference on Machine Learning,\nICML 2014, Beijing, China, 21-26 June 2014, pages 100\u2013108, Beijing, China, 2014. JMLR,\nInc.\n\n[10] Yuxin Chen, Leonidas J. Guibas, and Qi-Xing Huang. Near-optimal joint object matching via\n\nconvex relaxation. In ICML, pages 100\u2013108, 2014.\n\n[11] Luca Cosmo, Emanuele Rodol\u00e0, Andrea Albarelli, Facundo M\u00e9moli, and Daniel Cremers.\nConsistent partial matching of shape collections via sparse modeling. Comput. Graph. Forum,\n36(1):209\u2013221, 2017.\n\n[12] Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip H\u00e4usser, Caner Hazirbas, Vladimir Golkov,\nPatrick van der Smagt, Daniel Cremers, and Thomas Brox. Flownet: Learning optical \ufb02ow with\nconvolutional networks. In 2015 IEEE International Conference on Computer Vision, ICCV\n2015, Santiago, Chile, December 7-13, 2015, pages 2758\u20132766, 2015.\n\n[13] Somaye Hashemifar, Qixing Huang, and Jinbo Xu. Joint alignment of multiple protein-protein\ninteraction networks via convex optimization. Journal of Computational Biology, 23(11):903\u2013\n911, 2016.\n\n[14] Qi-Xing Huang and Leonidas Guibas. Consistent shape maps via semide\ufb01nite programming.\nIn Proceedings of the Eleventh Eurographics/ACMSIGGRAPH Symposium on Geometry Pro-\ncessing, SGP \u201913, pages 177\u2013186, Aire-la-Ville, Switzerland, Switzerland, 2013. Eurographics\nAssociation.\n\n[15] Qi-Xing Huang, Guo-Xin Zhang, Lin Gao, Shi-Min Hu, Adrian Butscher, and Leonidas Guibas.\nAn optimization approach for extracting and encoding consistent maps in a shape collection.\nACM Trans. Graph., 31(6):167:1\u2013167:11, November 2012.\n\n[16] Qixing Huang, Simon Fl\u00f6ry, Natasha Gelfand, Michael Hofer, and Helmut Pottmann. Re-\nassembling fractured objects by geometric matching. ACM Trans. Graph., 25(3):569\u2013578, July\n2006.\n\n9\n\n\f[17] Qixing Huang, Fan Wang, and Leonidas Guibas. Functional map networks for analyzing and\nexploring large shape collections. ACM Transactions on Graphics, 33(4):36:1\u201336:11, July 2014.\n\n[18] Xiangru Huang, Zhenxiao Liang, Chandrajit Bajaj, and Qixing Huang. Translation synchroniza-\n\ntion via truncated least squares. In NIPS, 2017.\n\n[19] Xiangru Huang, Zhenxiao Liang, Xiaowei Zhou, Yao Xie, Leonidas J. Guibas, and Qixing\n\nHuang. Learning transformation synchronization. CoRR, abs/1901.09458, 2019.\n\n[20] Daniel Huber. Automatic Three-dimensional Modeling from Reality. PhD thesis, Carnegie\n\nMellon University, Pittsburgh, PA, December 2002.\n\n[21] Xiaoye Jiang, Lek-Heng Lim, Yuan Yao, and Yinyu Ye. Statistical ranking and combinatorial\n\nhodge theory. Math. Program., 127(1):203\u2013244, March 2011.\n\n[22] Telikepalli Kavitha, Christian Liebchen, Kurt Mehlhorn, Dimitrios Michail, Romeo Rizzi,\nTorsten Ueckerdt, and Katharina A. Zweig. Survey: Cycle bases in graphs characterization,\nalgorithms, complexity, and applications. Comput. Sci. Rev., 3(4):199\u2013243, November 2009.\n\n[23] Jaechul Kim, Ce Liu, Fei Sha, and Kristen Grauman. Deformable spatial pyramid matching for\n\nfast dense correspondences. In CVPR, pages 2307\u20132314. IEEE Computer Society, 2013.\n\n[24] Vladimir Kim, Wilmot Li, Niloy Mitra, Stephen DiVerdi, and Thomas Funkhouser. Exploring\ncollections of 3d models using fuzzy correspondences. ACM Trans. Graph., 31(4):54:1\u201354:11,\nJuly 2012.\n\n[25] Vladimir G. Kim, Yaron Lipman, and Thomas Funkhouser. Blended intrinsic maps. ACM Trans.\n\nGraph., 30(4):79:1\u201379:12, July 2011.\n\n[26] Spyridon Leonardos, Xiaowei Zhou, and Kostas Daniilidis. Distributed consistent data associa-\n\ntion via permutation synchronization. In ICRA, pages 2645\u20132652. IEEE, 2017.\n\n[27] Marius Leordeanu and Martial Hebert. A spectral technique for correspondence problems using\npairwise constraints. In Proceedings of the Tenth IEEE International Conference on Computer\nVision - Volume 2, ICCV \u201905, pages 1482\u20131489, Washington, DC, USA, 2005. IEEE Computer\nSociety.\n\n[28] Ce Liu, Jenny Yuen, and Antonio Torralba. Sift \ufb02ow: Dense correspondence across scenes and\n\nits applications. IEEE Trans. Pattern Anal. Mach. Intell., 33(5):978\u2013994, May 2011.\n\n[29] Andy Nguyen, Mirela Ben-Chen, Katarzyna Welnicka, Yinyu Ye, and Leonidas Guibas. An\nOptimization Approach to Improving Collections of Shape Maps. Computer Graphics Forum,\n30:1481\u20131491, 2011.\n\n[30] Jorge Nocedal and Stephen J. Wright. Numerical optimization. Springer series in operations\n\nresearch and \ufb01nancial engineering. Springer, New York, NY, 2. ed. edition, 2006.\n\n[31] Maks Ovsjanikov, Mirela Ben-Chen, Justin Solomon, Adrian Butscher, and Leonidas J. Guibas.\nFunctional maps: a \ufb02exible representation of maps between shapes. ACM Trans. Graph.,\n31(4):30:1\u201330:11, 2012.\n\n[32] Deepti Pachauri, Risi Kondor, Gautam Sargur, and Vikas Singh. Permutation diffusion maps\n(PDM) with application to the image association problem in computer vision. In NIPS, pages\n541\u2013549, 2014.\n\n[33] Deepti Pachauri, Risi Kondor, and Vikas Singh. Solving the multi-way matching problem by\n\npermutation synchronization. In NIPS, pages 1860\u20131868, 2013.\n\n[34] David M. Rosen, Luca Carlone, Afonso S. Bandeira, and John J. Leonard. Se-sync: A certi\ufb01ably\ncorrect algorithm for synchronization over the special euclidean group. I. J. Robotics Res.,\n38(2-3), 2019.\n\n[35] Raif M. Rustamov. Laplace-beltrami eigenfunctions for deformation invariant shape representa-\ntion. In Proceedings of the Fifth Eurographics Symposium on Geometry Processing, SGP \u201907,\npages 225\u2013233, Aire-la-Ville, Switzerland, Switzerland, 2007. Eurographics Association.\n\n10\n\n\f[36] Yanyao Shen, Qixing Huang, Nati Srebro, and Sujay Sanghavi. Normalized spectral map\nsynchronization. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors,\nAdvances in Neural Information Processing Systems 29, pages 4925\u20134933. Curran Associates,\nInc., Barcelona, Spain, 2016.\n\n[37] Hao Su, Charles Ruizhongtai Qi, Yangyan Li, and Leonidas J. Guibas. Render for CNN:\nviewpoint estimation in images using cnns trained with rendered 3d model views. In ICCV,\npages 2686\u20132694. IEEE Computer Society, 2015.\n\n[38] Joel A. Tropp. An introduction to matrix concentration inequalities. Found. Trends Mach.\n\nLearn., 8(1-2):1\u2013230, May 2015.\n\n[39] Fan Wang, Qixing Huang, and Leonidas J. Guibas. Image co-segmentation via consistent\nfunctional maps. In Proceedings of the 2013 IEEE International Conference on Computer\nVision, ICCV \u201913, pages 849\u2013856, Washington, DC, USA, 2013. IEEE Computer Society.\n\n[40] Fan Wang, Qixing Huang, Maks Ovsjanikov, and Leonidas J. Guibas. Unsupervised multi-class\n\njoint image segmentation. In CVPR, pages 3142\u20133149. IEEE Computer Society, 2014.\n\n[41] Lanhui Wang and Amit Singer. Exact and stable recovery of rotations for robust synchronization.\n\nInformation and Inference: A Journal of the IMA, 2:145\u2013193, December 2013.\n\n[42] Yunhai Wang, Shmulik Asa\ufb01, Oliver van Kaick, Hao Zhang, Daniel Cohen-Or, and Baoquan\nChen. Active co-analysis of a set of shapes. ACM Trans. Graph., 31(6):165:1\u2013165:10, November\n2012.\n\n[43] Yu Xiang, Roozbeh Mottaghi, and Silvio Savarese. Beyond PASCAL: A benchmark for 3d\n\nobject detection in the wild. In WACV, pages 75\u201382. IEEE Computer Society, 2014.\n\n[44] Christopher Zach, Manfred Klopschitz, and Marc Pollefeys. Disambiguating visual relations\n\nusing loop constraints. In CVPR, pages 1426\u20131433. IEEE Computer Society, 2010.\n\n[45] Zaiwei Zhang, Zhenxiao Liang, Lemeng Wu, Xiaowei Zhou, and Qixing Huang. Path-invariant\n\nmap networks. CoRR, abs/1812.11647, 2018.\n\n[46] Tinghui Zhou, Philipp Kr\u00e4henb\u00fchl, Mathieu Aubry, Qi-Xing Huang, and Alexei A. Efros.\nLearning dense correspondence via 3d-guided cycle consistency. In CVPR, pages 117\u2013126,\n2016.\n\n[47] Tinghui Zhou, Yong Jae Lee, Stella X. Yu, and Alexei A. Efros. Flowweb: Joint image set\nalignment by weaving consistent, pixel-wise correspondences. In CVPR, pages 1191\u20131200.\nIEEE Computer Society, 2015.\n\n[48] Xiaowei Zhou, Menglong Zhu, and Kostas Daniilidis. Multi-image matching via fast alternating\nminimization. In Proceedings of the IEEE International Conference on Computer Vision, pages\n4032\u20134040, 2015.\n\n11\n\n\f", "award": [], "sourceid": 583, "authors": [{"given_name": "Leonidas", "family_name": "Guibas", "institution": "stanford.edu"}, {"given_name": "Qixing", "family_name": "Huang", "institution": "The University of Texas at Austin"}, {"given_name": "Zhenxiao", "family_name": "Liang", "institution": "The University of Texas at Austin"}]}